AUTOMATIC GROUPS AND KNUTH BENDIX WITH 
INFINITELY MANY RULES 



D. B. A. EPSTEIN AND PAUL J. SANDERS 

Abstract. It is shown how to use a small finite state automaton in two vari- 
ables in order to carry out part of the Knuth-Bendix process for rewriting 
words in a group. The main objective is to provide a substitute for the most 
space-demanding module of the existing software which attempts to find a 
s/ioriiex-automatic structure for a group. The two-variable automaton can be 
used to store an infinite set of rules and to carry out fast reduction of arbi- 
trary words using this infinite set. We introduce a new operation, which we 
call welding, which applies to an arbitrary finite state automaton. In our con- 
text this operation is vital. We point out a small potential improvement in the 
subset algorithm for making a non-deterministic automaton deterministic. 



1. Introduction 

A celebrated result of Novikov and Boone asserts that the word problem for 
finitely presented groups is, in general, unsolvable. This means that a finite pre- 
sentation of a group has been written down, with the property that there is no al- 
gorithm whose input is a word in the generators, and whose output states whether 
or not the word is trivial. So, given a presentation of a group which one is unable 
to analyze, can any help at all be given by brute force methods, using a computer? 

The answer is that some help can be given with the kind of presentation that 
arises naturally in the work of many mathematicians, even though one can formally 
prove that there is no procedure that will always help. 

There are two general techniques for trying to determine, with the help of a 
computer, whether two words in a group are equal or not. One is the Todd- 
Coxeter coset enumeration process and the other is the Knuth-Bendix process. 
Todd-Coxeter is more adapted to finite groups which arc not too large. We are 
mostly interested in groups which arise in the study of low dimensional topology. 
In particular they are infinite groups, and the number of words of length n rises 
exponentially with n. For this reason, Todd-Coxeter is not much use in practice. 
Well before Todd-Coxeter has had time to work out the structure of a large enough 
neighbourhood of the identity in the Cayley graph to be helpful, the computer is 
out of space. 

On the other hand, the Knuth-Bendix process is much better adapted to this 
task, and it has been used quite extensively, particularly by Sims, for example 
in connection with computer investigations into problems related to the Burnside 
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problem. It has also been used to good effect by Holt and Rees in their auto- 
mated searching for isomorphisms and homomorphisms between two given finitely 
presented groups (see |^) In connection with searching for a s/ioritei-automatic 
structure on a group (we say what this means in Section Holt was the first 
person to realize that the Knuth-Bendix process might be the right direction to 
choose (see [Q). However, Knuth-Bendix will run for ever on even the most in- 
nocuous hyperbolic triangle groups, which are perfectly easy to understand. Holt's 
successful plan was to use Knuth-Bendix for a certain amount of time, decided 
heuristically, and then to interrupt Knuth-Bendix and use axiom-checking, a part 
of automatic group theory (see Chapter 6]), to find an automatic structure on 
the group. Thus, using the concept of an automatic group as a mechanism for 
bringing Knuth-Bendix to a halt has been one of the philosophical bases for the 
work done at Warwick in this field almost from the beginning. In addition to the 
works already cited in this paragraph, the reader may wish to look at and |^ . 

For a shortlex-antomatic group, a minimal set of Knuth-Bendix rules may be 
infinite, but it is always a regular language, and therefore can be encoded by a 
finite state machine. In this paper, we carry this philosophical approach further, 
attempting to compute this finite state machine directly, and to carry out as much 
of the Knuth-Bendix process as possible using only approximations to this machine. 

Thus, we describe a setup that can handle an infinite regular set of Knuth-Bendix 
rewrite rules. For our setup to be effective, we need to make several assumptions. 
Most important is the assumption that we are dealing with a group, rather than 
with a monoid. Secondly, our procedures are perhaps unlikely to be of much help 
unless the group actually is s/iortfe3;-automatic. 

As a computer science byproduct of our work, we produce a new operation 
on automata, which we call welding. Although this is an operation which makes 
sense on the level of abstract languages, we do not see any use for it apart from 
those indicated in this paper, which is concerned very much with equations in 
groups. Another computer science byproduct is a small improvement which one 
can sometimes make in the process of determinizing a finite state automaton. Since 
determinization is potentially exponential, even a small improvement can be useful. 

Previous computer implementations of the semi-decision procedure to find the 
s/iort/es-automatic structure on a group are essentially specializations of the Knuth- 
Bendix procedure |^ to a string rewriting context together with fast, but space- 
consuming, automaton-based methods of performing word reduction relative to 
a finite set of s/iort/ex-reducing rewrite rules. Since s/iort/ea;-automaticity of a 
given finite presentation is, in general, undecidable, space-efficient approaches to 
the Knuth-Bendix procedure are desirable. In this paper we present a new algo- 
rithm which performs a Knuth-Bendix type procedure relative to a possibly infinite 
regular set of shortlex-ieducing rewrite rules, together with a companion word re- 
duction algorithm which has been designed with space considerations in mind. 

We would like to thank Derek Holt for many conversations about this project, 
both in general and in detail. His help has, as always, been generous and useful. 

2. String rewriting 

In this section we review some standard material on string-rewriting, with the 
object of making this paper reasonably self-contained. Later sections will review 
standard material on automata and automatic groups. 
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Definition 2.1. Let A be a finite set (usually called the alphabet). We define 

A* to be the set of strings of symbols in A. In other words, A* is the free monoid 
generated by A, with multiplication defined by concatenation. The identity element 
in this monoid is the empty string, denoted by e. 

Definition 2.2. Given a finite alphabet A, a subset R oi A* x A* is known as a 
rewrite system on A*. The elements of R are known as rewrite rules. 

Definition 2.3. Associated with a rewrite system R we define three relations -^b,, 
— s-^ and on A*. For u,v € A* we write u v (and say that u rewrites to v) if 
there are strings x,y G A* and a rewrite rule {X,p) G R such that u = xXy and v = 
xpy. This is also called an elementary reduction. The relation is the reflexive, 
transitive closure of -^r and the relation is the reflexive, symmetric, transitive 
closure of -^r. The congruence <-^^ is called the Thue congruence generated by R 
and we denote the congruence class of a string w G A* hy [w]fi. 

If there is no infinite sequence ui -^r U2 -^r • • • of rewrites we say that R is 
Noetherian. In such a system each congruence class contains at least one irreducible 
string, i.e. an element w G A* which contains no substring equal to the left-hand 
side of any rewrite rule. In a Noetherian rewrite system any string w is reduced to 
an irreducible element of [w]r by a finite sequence of rewrites. If each congruence 
class of a Noetherian system contains a unique irreducible then the word problem 
in the quotient monoid A* / <-^>^ is solved by rewriting. 

A rewrite system R is called confluent if whenever u,v,w G A* with u ^r v and 
u -^R w, there exists some string z G A* with v — >^ z and w z. Confluence can 
easily be proved necessary and sufficient for each congruence class in a Noetherian 
rewrite system to contain a unique irreducible. In this case elements of the monoid 
A* / -i-*^ can be defined by juxtaposition and reduction. 

2.4. Critical pair analysis. For a finite Noetherian rewrite system R the ques- 
tion of unique irreducibles entails only a finite computation known as critical pair 
analysis. If a finite Noetherian R is not confluent one can easily prove that the 
property must fail at one of a finite number of triples (ujVjW). Such triples are 
obtained by considering pairs of rules in R whose left-hand sides have a non-trivial 
overlap. For such a pair of rules (Ai,/9i), (A2,p2) there are two types of overlap. 
First, a non-empty string z may be a suffix of Ai = siz and a prefix of A2 = 2:52 
(or vice versa). Second, A2 may be a substring of Ai (or vice versa) and we write 
Ai = S1A2S2. 

These cases are not disjoint. In particular, if one of si and S2 is trivial in the 
second case, it can equally well be treated under the first case with z equal either 
to Ai or to A2. 

2.5. First case of critical pair analysis. In the first case, the triple (u, v, w) = 

(sizs2, P1S2, S1P2). There are two ways of starting to reduce u = sizs2, namely to 
V = P1S2 and to w = Sip2. Further reduction to irreducibles either gives the same 
irreducible for each of the two computations, or else gives us distinct irreducibles 
v' and w' . In the latter situation we can augment R cither with the rule [v' ,w') 
or with {w',v') provided the system obtained remains Noetherian. (If it doesn't 
remain Noetherian with either choice, we will almost certainly have to give up on 
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the whole process.) Since v' was previously congruent to w' , the congruence on A* 
is unaltered by introducing such a rule. 

Note that it is important to allow (Ai,pi) — {X2,P2) in the first case, provided 
there is a z which is both a proper suffix and a proper prefix of Ai = A2. 

2.6. Second case of critical pair analysis. In the second case, the triple {u, v, w) - 
(Ai, pi, S1/92S2). If Pi and S1/52S2 do not reduce to the same irreducible, we augment 
R with a new rule (w', w') or with {w' , v'), provided the system remains Noetherian. 

2.7. Omitting rules. In practice, it is important to remove rules which are re- 
dundant, as well as to add rules which are essential. Omitting rules is unnecessary 
in theory, provided that we have unlimited time and space at our disposal. In 
practice, if we don't omit rules, we are liable to be overwhelmed by unnecessary 
computation. Moreover, nearly all programs in computational group theory suffer 
from excessive demands for space. Indeed this is one of the reasons for developing 
the algorithms and programs discussed in this paper. So it is important to throw 
away information that is not needed and doesn't help. 

For this reason, in Knuth-Bendix programs one normally looks from time to 
time at each rule (A, p) to see if it can and should be omitted. If a proper substring 
of the left-hand side can be reduced, then we are in the situation of |2.6| . If the two 
reductions mentioned in |2.6| lead to the same irreducible, we omit (A, p) from the 
set of rules. If the two reductions lead to different irreducibles, then we augment 



the set of rules as described in 2.6 and again omit (A, p). 

We also investigate whether the right-hand side p of a rule (A, p) is reducible to 
p'. If so, we can omit {X, p) from R and replace it with the rule (A,p') without 
changing the congruence on A* . 

2.8. Maintaining the congruence. The insertion of a rule (A, p), for which we 
already know that A and p are congruent, clearly does not change the congruence. 

Suppose, on the other hand, that (A, p) is a rule where A is known to be congruent 
to p using only rules other than (A, p). Then (A, p) can be omitted without changing 
the congruence. 

The process of analysing critical pairs and augmenting or diminishing the rule 
set without changing the congruence on A* is known as the Knuth-Bendix Process. 
If this terminates, it gives a finite confluent rewriting system for the congruence. 
Usually it does not terminate and it produces new rules ad infinitum. At each 
augmentation we have to choose one of two new rules to insert and we have to 
ensure that the augmented system is still Noetherian in order for the procedure to 
continue. The choice is generally made using some total ordering on the elements 
of A*. 



2.9. Ordering. Given an alphabet A, a reduction ordering on A* is a well-ordering 
which is invariant under left and right multiplication. Suppose we have a rewriting 
system R on A* , such that, for each (A,p) e R, we have p < X. Then we say that 
R is consistent with the reduction ordering. 

If R is consistent with a reduction ordering, then it is Noetherian. Moreover, 
to augment the rule set during the Knuth-Bendix Completion Procedure without 
destroying the Noetherian property, we have to choose either {v',w'), or {w',v'), 
using the notation introduced above where v' and w' are certain elements in A*. 
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We use (w', w') if v' > w' and (w',v') if w' > v'. For more information on Knuth- 
Bendix and rewriting see ||]. 

Given an ordering < on a finite alphabet A we can extend this to a well-ordering 
on A* by defining u < v to mean either f ) |u| < \v\ or 2a) |u| = \v\ and 2b) u < v 
in the lexicographic order on A* induced by the ordering on A. This is clearly a 
reduction ordering on A* and is termed the s/iorilea;-ordering induced by <. It is 
the only ordering we will use in this paper. 

We order pairs (A, p) E A* x A* by setting (A', p') > (A, p) if and only if either 
A' > A or A' = A and p' > p. This is clearly a well-ordering on the set of pairs. 

2.10. Infinite sets of rules. In theory, critical pair analysis can be undertaken 
even on an infinite set of rules R, provided we are working with a reduction ordering. 
We set Ri = R. In the n-th step, we do critical pair analysis on all rules of i?„ 
such that the sum of the lengths of left-hand side and right-hand side is at most n. 
The effect on i?„ is to delete some rules, namely those such that either the right- 
hand side has been shown to be reducible or the left-hand side is reducible without 
using the rule itself, and to insert others, namely those that arise in the critical 
pair analysis. The resulting set of rules is Rn+i- We can form S = f]^ Un>m ^n- 
The congruence on A* induced by R is the same as the congruence induced by i?„ 
for each n. It can also be proved to be the same as the congruence induced by 
S. Moreover S can be proved to be a confluent system. The S'-irreducibles are in 
one-to-one correspondence with the elements of the monoid A* / 
If Ri is finite, then i?„ is finite for each n. 

Unfortunately, S is sometimes not a recursive set, even if the i?„ are all finite, 
so that it cannot be computed by a Turing machine. 5' consists of exactly those 
rules with irreducible right-hand sides and reducible left-hand sides, such that any 
proper substring of the left-hand side is irreducible. 

In our treatment, we will be dealing with an infinite set of rules defined implicitly 
by a finite state automaton. However, we will not attempt to perform Knuth- 
Bendix directly on this infinite set. 



2.11. Knutli— Bendix pass. One procedure for carrying out the Knuth-Bendix 
process is to divide the finite set S of rules found so far into three disjoint subsets. 
The first subset, called Considered, is the set of rules whose left-hand sides have 
been compared with each other and with themselves for overlaps. The second 
set of rules, called This, is the set of rules waiting to be compared with those in 
Considered. The third set, called New, consists of those rules most recently found. 
Here we only sketch the process. Full details are provided in Section ^ 

The Knuth-Bendix process proceeds in phases, each of which is called a Knuth- 
Bendix pass. Each pass starts by looking at each rule in Considered and seeing 
whether it can be deleted as in ^.7[ Consideration of an existing rule in Considered 
can lead to a new rule, in which case the new rule is added to New. 

Next, we look at each rule r in New to see if it is redundant. If it is redundant 
it is replaced by a non-redundant rule. The details will be given in 12.6. The 
non-redundant version of the rule is moved into This. 

We then look at each rule in This. Its left-hand side is compared with itself and 
with all the left-hand sides of rules in Considered, looking for overlaps as in 2.5. Any 
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new rules found are added to New. Then r is moved into Considered. Eventually 
This becomes empty. 

We then proceed to the next pass. 



3. Automata and operations on them 

This section is devoted to standard material. 

Definition 3.1. A non- deterministic finite state automaton (abbreviated NFA) is 
defined to be a quintuple {S, A, fi, F, Sq), where 5" is a finite set called the set of 
states, A is a finite set called the alphabet, /i is a set of triples, called the set of 
arrows, of the form (s, x, t) with s,t € S and x £ A or x = e, where e is defined in 



2.1. F d S is called the set of final states and So d S is called the set of initial 
states. The source of an arrow (s,x,t) is defined to be s and the target is defined 
to be t. Final states are sometimes called accept states, initial states are sometimes 
called start states and arrows are sometimes called transitions. 

We define a path of arrows in a non-deterministic automaton M — {S, A, fi, F, So) 
to be a finite sequence of the form {uo, xi,ui, . . . ,Xn,Un), where n > and, for 
each i with 1 < i < n, {ui-i, Xi,Ui) G fi. The length of the path is n. The label 
associated to a path is the element xi . . . Xn G A* . If n = 0, the label is e G A* . 
The language L{M) accepted by M is defined to be the set of all labels of paths of 
arrows starting with some uo S >S'o and ending with some Un € F. If a subset of 
A* is equal to L{M) for some non-deterministic automaton M, then the subset is 
called a regular language. 



Definition 3.2. A partially deterministic finite state automaton (abbreviated PDFA) 
is defined to be an NFA M = {S, A, fi, F, So) which contains exactly one initial state, 
has no e-arrows and where for each s G S and x G A there is at most one arrow of 
the form {s,x,t). 



Definition 3.3. A deterministic finite state automaton (abbreviated DFA) is de- 
fined to be an NFA {S, A, n, F, So), in which there are no e-arrows, Sq is a singleton 
whose unique element sq is called the initial state, and such that, for each s G S 
and X G A, there is exactly one arrow of the form (s, x, t). 

Given a non-deterministic automaton M and a subset T of the set of states S we 
define the e-closure £{T) of T to be the subset of S which one can reach from some 
element of T by following a path of e-arrows. A non-deterministic automaton can be 
converted into a deterministic automaton accepting the same language as follows. 
The states of the new automaton are the e-closed subsets of S (one of these states 
is the nuUset). Given an e-closed set T and x G A, we define an arrow [T,x,P), 
where P is obtained by taking the set of targets of all a;-arrows with source in T 
and then taking its e-closure. The initial state is the e-closure oi So. A state of the 
new deterministic automaton is final if and only if it contains a final state of M . 

This proves the following standard theorem. 

Theorem 3.4. For any NFA M there is a DFA N with L{N) = L{M). 



Note 3.5. We will use the abbreviation FSA to denote an automaton which is a 
DFA or an NFA or a PDFA. 



AUTOMATIC GROUPS AND KNUTH-BENDIX 



7 



Computationally, the procedure of finding the e-closed subsets and the arrows 
between them is known as the subset construction and this is central to our word 
reduction algorithm. There is a theoretical exponential blow-up in the subset con- 
struction which is known to be unavoidable in general. In the cases which come 
up in practice in our work, the subset construction can certainly be a problem, but 
is often not as bad as the worst case analysis seems to suggest. The implemen- 
tation need only construct those e-closed subsets which can be reached from 5*0. 
The space and time demands of the procedure are proportional to the number of 
such subsets. We will also use lazy evaluation to reduce the worst effects of this 
exponential blow-up. This will be described later. 

For a general DFA M, the process of finding a DFA M' with L(M') = L{M), 
such that the number of states of M' is minimal, is known as minimization. The 
existence and uniqueness (up to isomorphism) of such an automaton is known as 
the Myhill-Nerode theorem and many practical algorithms exist to find M' given 
M — for a detailed survey and comparisons see . 

In order to define what is meant by an automatic group we need to first formalize 
what it means for an automaton to accept pairs of strings over an alphabet A. 
Consider, for example, the pair of strings (abb,ccd). We regard this pair as a 
string (a, c)(6, c)(6, d) over the product alphabet A x A. If the pair of strings is 
{abb, cede), then we have to pad the shorter of the two strings to make them the 
same length, regarding this pair as the string of length four (a, c)(5, c)(6, d){%, c). In 
general, given an arbitrary pair of strings (u,v) G A* x A*, we regard this instead 
as a string of pairs by adjoining a padding symbol $ to ^ and then "padding" the 
shorter of u and v so that both strings have the same length. We obtain a string 
over AU {$} X All {$}. The alphabet A U {$} is denoted A+ and is called the 
padded extension of A. The result of padding an arbitrary pair (u, v) is denoted 
{u, A string w € x {A~^)* is called padded if there exists u,v G A* with 

w — {u, t;)+ (in other words, at most one of the two components of w ends with a 
padding symbol). 

A set of pairs of strings over A is called regular if the corresponding set of padded 
strings is a regular language over the product alphabet x y4+ . 

We will need two standard definitions when dealing with finite state automata. 

Definition 3.6. Let M be an FSA. We define its reversal Rev{M) to be the FSA 
obtained from M by taking the same set of states, interchanging the subsets of 
initial and final states, and then reversing the direction of all arrows. The reversal 
of a DFA is in general an NFA rather than a DFA. 



Definition 3.7. An FSA is called trim if each state has an accepted path of arrows 
passing through it. 



4. A MODIFIED DETERMINIZATION ALGORITHM 

In this section we discuss a modification to the usual determinization algorithm 
for turning an NFA into a DFA. Let be an NFA. The proof that N can be 
determinized is discussed just before the statement of Theorem 3.4. Let M be the 
corresponding determinized automaton, so that a state of M is a subset of states 
of N . In practice, to find M, we start with the e-closure of the set of initial states 
of N and proceed inductively. If we have found a state s of M as a subset of the 
set of states of N, we fix some x £ A, and apply x in all possible ways to alH e s. 
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where Hs a state of N. We then follow with e-arrows to form an e-closed subset of 
states of TV. This gives us the result of applying x to s. The modification we wish 
to make to the usual subset construction is now explained and justified. 

We will denote by M' the modified version of M thus obtained. M' is a DFA 
which accepts the same language as M and N, but the structure of M' might be a 
little simpler than that of M. 

Suppose p is a state of the NFA N. Let Np be the same automaton as N, except 
that the only initial state is p. Suppose p and q are distinct states of N and that 
L{Np) C L{Nq). Suppose also that the e-closure of q does not include p. Under 
these circumstances, we can modify the subset construction as follows. As before, 
we start with the e-closurc of the set of initial states of A^. We follow the same 
procedure for defining the arrows and states of M' as for M, except that, whenever 
we construct a subset containing both p and q, we change the subset by omitting 
P- 

The situation can be generalized. Suppose that, for 1 < i < /c, pi and qi are 
states of N. We assume that all 2k states are distinct from each other and that, 
for each pair (z, j), the e-closure of qi does not include pj. Suppose further that, 
for each i, L{Np.) C L{Nq^). We follow the same procedure for defining the arrows 
and states of M' as for M, except that, whenever we construct a subset containing 
both Pi and qi, we change the subset by omitting pi. 

Theorem 4.1. Under the above hypotheses, L{M') = L{M). 

Proof. Consider a string w = x\-- -Xn G A* which is accepted by N via the path 
of arrows in N 

{vo,e*,Ui,Xi,Vi, ■ ■ ■ ,Vn-l,€*,Un,Xn,Vn,e*,Un+l)- 

This means that, for each i with < i < n, there is an a;,-arrow in N from Ui to Vi 
and is in the e-closure of v,. Moreover vo is an initial state and u„+i is a final 
state. 

Suppose inductively that after reading xi • • • Xi-i, M' is in state Si_i. We assume 
inductively that we have a path of arrows in N 

(wj, Xi, Vi, e*, Ui_^i, • • • , M„, Xn, v^, e*, u^j+i), 

such that u- e Si-\ and u^+i is a final state. 

The induction starts with i = 1 and sq the initial state of M' . We form sq by 
taking all initial states of N , and taking their e-closure. If this subset of states of 
N contains both pj and qj, then pj is omitted from sq, the initial state of M' . 

If Ui ^ So, then we must have Ui = pj for some j, with qj € sq. Now w G 
L{Np-) C L{Nq.). It follows that we can take u\ in the e-closure of qj and then 
define the rest of the path of arrows for the case i = 1. Since qj G sq and u\ is in 
the e-closure of qj, ul is not equal to any of the Pr- So u{ € sq and the induction 
can start. 

Now suppose we have a path of arrows 

(uj, Xi, Wj, e*, u^_^i, ■ ■ ■ , u„, Xn, w„, e*, u^^i), 

in N such that G Si-i and w^+i is a final state of N. We define Si from Si-i 
in the usual way, applying Xi in all possible ways to all states in s,_i, obtaining in 
particular w*, and then taking the e-closure, obtaining in particular ul^i- Finally, if, 
for some r, Si contains both pr and qr, then p^ is deleted from Sj before it becomes 
a state of M'. 
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It now follows that either ul_^_-^ G Si, or else, for some r with 1 < r < fc, ul^-^ = pr, 
Qr e Si and Pr ^ Si. In the first case we define m^"*"^ = Uj and vj^^ = Vj for 
j > i and the induction step is complete. In the second case, using the fact that 
Xi+i • ■ ■ Xn G L{Np^) C L{Nq^), we see that we can take ul^\ in the e-closure of 
Qr and then define the rest of the path of arrows. Since qr G Si and is in the 
e-closure of Qr, it^^J is not equal to any other ps and so ul'^\ G s^. This completes 
the induction step. 

At the end of the induction, M' has read all of w and is in state s„. We also 
have the final state mJ^+J G s„, so that w is accepted by M' . 

Conversely, suppose w is accepted by M' . It follows easily by induction that if 
M' is in state Si after reading the prefix a;i • • • of w, then each state u € Si can be 
reached from some initial state of by a sequence of arrows labelled successively 
xi, ... ,Xi, possibly interspersed with e-arrows. Now s„ must contain a final state, 
and so w is accepted by A^. □ 



Remark 4.2. The practical usage of this theorem clearly depends on having an 
efficient way of determining when the condition L{Np) C L{Nq) is satisified. Later 
we will see examples of such tests which cost virtually nothing to implement but 
have the potential to save an appreciable amount of both space and time. 

5. Automatic groups 

Definition 5.1. A group G is called automatic if there exists a finite inverse closed 
set A of monoid generators of G and a regular language L over A satisfying the 
following two properties. 

1. The natural monoid epimorphism 7 : A* — > G remains surjective when re- 
stricted to L. 

2. The set 

{(u, w) : M, u G L and {ux)^ = for some a; G A U {e}} (1) 
is regular. 

An FSA W with L{W) — L \s called a word acceptor for G. A word acceptor 
together with an FSA accepting the language (|^) is called an automatic structure 
for G relative to A. 

This definition is succinct but suppresses the geometry which lies behind the im- 
portance of this class of groups. Given a group G generated by a finite subset A^ 
the Cayley graph C{G,A) is the graph whose vertices are the elements of G and 
where an edge joins two vertices g, h if and only if there is a generator a G A with 
ga = h. Denoting images under the natural epimorphism A* —> G hy overscores 
and the length of a string w G A* by we define a metric on C{G, A) by letting 

d{g, h) = min{|?i;| : w G A* with W = g^^h\. 

This is termed the word metric. We get the same metric by taking each edge of 
the Cayley graph and giving it length one. This makes the Cayley graph into a 
geodesic space. For w G A* and i G N, we denote by w{i) the prefix of w of length 
i (for i > 1^1 this equals w). 
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Given two words u, w £ A* and a positive real number k, we say that u and v 
fellow-travel with constant k in C{G, A) if the group elements 

WD{G,A,u,v) = \Mj)~^v{i)-i^^] (2) 

lie in the ball of radius k around the identity element of the group. We then have 
the following geometrical characterization of an automatic group. 

Theorem 5.2 (|](Th eorem 2.3.5)). Let G be a group generated by a finite inverse 
closed set of monoid generators A, and L a regular language over A mapping onto 
G under the restriction of the natural epimorphism A* G. Then L satisfies 



property 2 of Definition 5J_ if and only if there exists a constant k > such that 
for any u,v E L, if d{u,v) < 1 in C{G,A) then u and v fellow-travel with constant 
k in cIg,A). 

It follows immediately that, for an automatic group, the union of the sets WD{G, A, u, v) 
taken over all pairs (u, v) with u,v £ L and d{u,v) < 1, is a finite set. Here d is the 
distance in the Cayley graph. This is also the minimal length of u~^v as a word 
over A. 

The union of this finite set with the set A of generators plus the identity of G is 
called the set of word differences WD{G, A) of the automatic structure. WD{G, A) 
can be regarded as a PDFA over the alphabet yl+ x A^ where an arrow labelled 
(x, y) G A~^ X A~^ goes from the word difference vui to the word difference W2 if 
and only if x^^wiy = W2- We extend the domain of the epimorphism A* — » G to 
(A+)* by sending the padding symbol to the identity of G. The initial state is the 
identity of G and this is also the only final state. The resulting automaton is known 
as the word difference automaton of the automatic structure. The goal of our main 
algorithm is to calculate this automaton starting from a finite presentation of a 
shortlex-aritomatic group (defined below). For definitive information on automatic 
groups see the book js) . 

We can now give the definition of the class of groups we are chiefly interested in. 

Definition 5.3. A group G is called shortlex- automatic with respect to a finite 
inverse closed well-ordered set of monoid generators ( A, <) if 

1. G is automatic with respect to A. 

2. A string w G A* is accepted by the word acceptor if and only if w is the least 
element under the induced shortlex-oidermg of {v : v E A* and v = w} . 

6. Welding 

In this section we describe an operation, which we call welding, on FSAs which 
is central to our Knuth-Bendix procedure. The motivation for this operation is 
postponed to Section 0. 

Definition 6.1. An FSA is called welded if it is partially deterministic, trim and 
has a (partially) deterministic reversal. These conditions imply that, given x E A 
and a state i, there is at most one a;-arrow with target t and also that there is 
exactly one initial state and one final state. 

Given a trim non-empty NFA M, we can form a welded automaton from it as 
follows. Given any e-arrow (s,e, <), we may identify s with t. Given distinct initial 
states si and S2, we may identify si with S2- Given distinct final states ti and t2, 
we may identify ti with t2- Given distinct arrows {s,x,ti) and {s,x,t2), we may 
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identify ti with t2- Given distinct arrows {si,x,t) and (s2,a;,i), we may identify 
si with S2- Immediately after any identification of two states, we change the set of 
arrows accordingly, omitting any e-arrow from a state to itself. Since the number 
of states continually decreases, this process must come to an end, and at this point 
the automaton is welded. 

Theorem 6.2. Given a trim non-empty NFA M, all welded automata obtained 
from it by a process like that described above are isomorphic to each other; that 
is the welded automaton Q is independent of the order in which identifications are 
made. Moreover Q depends only on the language L{M). Q is the minimal PDFA 
accepting L{Q). It follows that welding can be regarded as an operation on regular 
languages. 

Proof. For each x ^ A, let x'^ be its formal inverse and let A^^ be the set of these 
formal inverses. We form from M an automaton over A U A^^ by adjoining an 
arrow of the form (t,x~^,s) for each arrow {s,x,t) of M, and adjoining an arrow 
(t, e, s) for each arrow (s, e, t). We also adjoin (si, e, S2) if si and S2 are either both 
initial states or both final states. We denote this new automaton by N. 

Let F be the free group generated by A. We define a relation on the set of 
states by s ^ i if there is a path of arrows from s to t in whose label gives 
the identity element of F. This is clearly an equivalence relation. Let Q be the 
quotient automaton, each of whose states is one of the equivalence classes above, 
with arrows inherited from M, not from N. All e-arrows are omitted from Q. It is 
easy to see that Q is welded. 

If M starts out by being welded, then it is easy to see that Q = M. 

Consider the identifications of states made during welding (see the passage fol- 



lowing Definition 6.1). It is easy to see that the equivalence classes of states used in 
the definition of Q are unaltered by one of these identifications. It follows that the 
automaton Q remains unaltered during the entire welding process. When no more 
identifications can be made, we have Q itself. This shows that Q is independent of 
the order in which the identifications are carried out. In fact Q can be characterized 
as the largest welded quotient of M. 

We claim that every element oiL{Q) arises as follows. Let {wi ,W2, ■ . ■ , 'u;2A;+i ) be 
a 2A:-t-l-tuple of elements of L(M), where fc > 0. Now consider luiw^^ . . . W2^W2k+i & 
F, and write it in reduced form, that is, cancel adjacent formal inverse letters wher- 
ever possible. If the result is in A* , that is, if after cancellation there are no inverse 
elements, then it is in L{Q). Moreover, any element of A* obtained in this way is in 
L{Q). This is straightforward to prove. We leave the details to the reader because 
we do not need the result. The proof uses the fact that M is trim. 

A welded automaton is minimal. For let s and t be distinct states, and let u 
and V be strings over A which lead from s and t respectively to the unique final 
state. Then u does not lead from t to the final state and v does not lead from s to 
the final state (otherwise s and t would be equal). It follows that s and t remain 
distinct in the minimized automaton. □ 

If M is a non-empty trim NFA, we denote by Weld{M) the PDFA obtained from 
it by welding. To compute Weld{M) efficiently, we first add "backward arrows" to 
M. That is, for each arrow {s,x,t) in M, including e-arrows, we add the arrow 
{t,x',s), where x' represents a backwards version of x. We also add e-arrows to 
connect the initial states, and e-arrows to connect the final states. We then make 
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use of a slightly modified version of the coincidence procedure of Sims given in [ pi] , 
4.6]. When this stops we have a welded automaton. 

In practice, in the automata which we want to weld, backward arrows are needed 
in any case for some of our algorithms. The procedure described in the preceding 
paragraph therefore fits our needs particularly well. 

7. A MOTIVATING EXAMPLE OF WELDING 

We will look at some particular examples to see what can happen during the 
Knuth-Bendix process on words in a group, and these examples will, we hope, 
convince the reader of the significance of welding as introduced in the previous 
section. 

We will use the standard generators a;, y, and their inverses X and Y for the 
free abelian group on two generators. Using different orderings on this set of four 
generators, we will see how welding works and why we want to use it. 

Consider the alphabet A = {x, X, y, Y} with the ordering x < X < y < Y, and 
denote the identity of A* by e. Let R be the rewriting system on A* defined by 
the set of rules 

{{xX,e), {Xx,e), {yY,e), {Yy,e), {yx,xy), {yX,Xy), {Yx,xY), {YX,XY)}. 

Using the shortlex-ordering on A* , it is straightforward to see that i? is a confluent 
system. 

We now change the ordering of the set of generators to x<y<X<Y and 
interchange the sides of the sixth rule (to get an order reducing and therefore 
Noetherian system). Once again the rules define the free abelian group on two 
generators. But this time there can be no finite confluent set of rewrite rules defining 
the same congruence. To see this, we consider the set of strings {xy'^X : n £ N}. 
None of these is shortlex-\ea,si within its ^^-equivalence class. Therefore each of 
these strings is reducible relative to any confluent set of rewrite rules which defines 
<-^^. On the other hand, each proper substring of one of the strings xy'^X is 
clearly shortlex-\ea.st within its <-> ^-equivalence class, and is therefore irreducible. 
It follows that a confluent set of rewrite rules must contain each of the strings xy^X 
as a left-hand side. Hence, in this situation, the Knuth-Bendix procedure will never 
terminate. 

We will show how to generate, after only a few steps, the automaton giving the 
required infinite confluent set of rewrite rules. 

We consider the rule r„ = {xy'^X, y") for some n e N. The corresponding padded 
string r+ gives rise to an (n + 3)-state PDFA M{rn) whose accepted language 
consists solely of the rule r„ . For n> 2 this PDFA is shown in Figure |l|. 

{x,y) {y,y) (y,y) (y,$) 

-K3 K3 KD • • ■ O K3 K3 »• 

12 3 n n+1 n+2 n+3 

Figure 1. The PDFA Af(r„) for n > 2. 

Continuing the discussion of the rules for a free abelian group on two genera- 
tors, we define M„ to be the disjoint union U{M{ri), . . . , M(r„)) of the automata 
M(ri), . . . , Af (r„), with set of initial (final) states equal to the collection of initial 
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iy,y) 




12 3 4 

Figure 2. A PDFA isomorphic to Weld{Mn),n > 1. 



(final) states for the various M{ri). If n > 1 then Weld{Mn) is isomorphic to the 
PDFA given in Figure |[ and the accepted language of this PDFA is the set of rules 
{r,; : i 6 N}. This is independent of n if n > 1. 

So in this example, after only two steps, the welding procedure provides us with 
a PDFA whose accepted language consists of an infinite set of rules, each of which 
is a valid identity in the group A* / Moreover, by using this PDFA to define a 
suitable reduction procedure, each of the strings xy"'X with n G N can be reduced 
to the shortlex-least representative of its <->^-equivalence class. 

For this group with the given ordering on the generators, it is not hard to show 
that by welding the original defining rules for the group together with the 4 rules 
{{xyX,y), {xy'^X,y'^), (yXY^X), {yX'^Y, X'^)}, we obtain a PDFA whose accepted 
language is a confluent set of rules. The reduction procedure, which we will de- 
scribe later, corresponding to this PDFA will reduce any string to its shortlex-least 
representative. 

8. Rule automata 

For the welding procedure to be used in a general Knuth-Bendix situation, 
we need to show that any rules obtained are valid identities in the correspond- 
ing monoid. We now show that if the monoid is a group (the situation we are 
interested in), any rules obtained are valid identities. 

Definition 8.1. Let A be a finite inverse closed set of monoid generators for a 
group G and, as before, denote images under the epimorphism (A^)* — > G by 
overscores. A rule automaton for G is an NFA M = {S,A'^ x , n, F, So) 
together with a function 4>m '■ S ^ G satisfying 

1. ^^,50 ^0. 

2. If s is an initial or final state then 4>m{s) — 1g- 

3. For any s,t G S* and {x,y) G A+ x A+ with {s,{x,y),t) G ^ we have 
4>M{t) = x^^4>M{s)y. 

4. For any s,t ^ S with (s, e, t) G we have (I)m{s) = 4>M{t)- 

Here is an equivalent way to look at the definition of a rule automaton. We 
regard G as the set of states of an automaton with alphabet x A'^ and with 
an arrow (g, (a;, y), h) if and only if x~^gy = h in G. Since G might be infinite, this 
would mean considering automata with an infinite number of states, and we would 
have to generalize our definitions. (Automata with an infinite number of states are 
fairly standard objects in the literature.) In this approach, we next define what we 
mean by a morphism of automata. A morphism sends states to states and arrows 
to arrows, but preserves labels on arrows. A rule automaton is then a two-variable 
automaton with a morphism into the two-variable automaton G. We leave the 
straightforward details to the reader. 
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Example 8.2. If A is a finite inverse closed set of monoid generators for a group 
G and r — {u, v) £ A* x A* satisifies u = v then, as in Figure ^ writing r"*" as a 
string (ui, ■ • • (u„, Vn) E {A'^ x we obtain an (n+ l)-state rule automaton 

M{r) = ({so, . . . , Sn}, A'^ X A+, /i, {so}, {sn}) for G where the arrows are given 

by 

^(si, {ui+i,Vi+i)) = Si+1, <i <n - 1. 
The function = 4>M{r) assigning group elements to states is defined inductively 
by (/'(so) = 1g and (t>{si) ~ ?I7~^(/)(si_i)Ui for 1 < i < n. As usual, the padding 



symbol is sent to Iq- The fact that u = v ensures that condition 2 of Definition B.l 
is satisfied. 

Remark 8.3. For a two-variable NFA M which is a rule automaton, the PDFA P 
obtained by applying the subset construction to the (non-empty) set of initial states 
of M (and the sets that arise), is also a rule automaton for G where the map (pp 
is induced from (j)M- The fact that this map is well-defined follows from conditions 
2, 3 and 4 of Definition and the fact that P is connected (by construction). 

The same remark applies to the modified subset construction described in Sec- 
tion |. 

Proposition 8.4. Let A be a finite inverse closed set of monoid generators for a 
group G and suppose that M is a rule automaton for G. Then 

1. Every accepted rule of M is a valid identity in G. 

2. Weld{M) is a rule automaton for G. 

Consequently every accepted rule of Weld{M) is a valid identity in G. 

Proof. Let r = (u, v) £ A* x A* be an accepted rule of M and write the padded 
string (w, u)+ as (mi,wi) ■ • • (un, V n) w here n = max{|u|, \v\}. Then in the PDFA P 
obtained from M (as in Remark |8.3| ), there exists a sequence of states sq, . . . , s„; 
also, for each i,l < i < n, there is a arrow from Si_i to Si labelled by {ui,Vi). 



Hence, from condition 3 of Definition 8.1, we have 

0p(S'i) = Ui^^ ■ ■ -ui^^vi- ■ -Vi, for all i with < i < n. 

In particular, ui ■ ■ ■ Un — vi ■ ■ -Vn and therefore the rule r is valid in G. 

To prove 2, we need only show that when any of the operations described just 



after Definition 6.1 is applied to a rule automaton M, we continue to have a rule 



automaton. This is obvious. The final statement is now immediate. □ 

Corollary 8.5. Let A be a finite inverse closed set of monoid generators for a 
group G and suppose that ri, . . . , r„i G A* x A* are valid identities in G. Then 
any rule accepted by Weld{AI{ri), . . . ,M{r„i)) is also a valid identity in G. 



Proof. For 1 < fc < m let M (r^) be the rule automaton for G as in Example 3.2 
Then the disjoint union U{M(ri), . . . , M{rm)) is also a rule automaton for G and 
so the result follows by Proposition |8.4| . □ 

Remark 8.6. Given a rule automaton M for a group G, the map 0m may not 
be injective. However, if (I)m{s) — 4>M{'t) and we can somehow determine that this 
is the case, then we can connect s to t by an e-arrow, and we still have a rule 
automaton. If we then weld, s and t will be identified. So we can hope to make (j)M 
injective. However, even if 0a/ is not injective, the rule automaton M can still be 
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useful for finding equalities in the group G. M may not tell the whole truth, but it 
does tell nothing but the truth. 



9. Which words are reducible? 

Suppose G is a group with a finite, inverse closed and ordered set of generators 
{A, <). In this section, we will work with a fixed two- variable automaton Rules. 
The automaton Rules arises in our work by welding together appropriate rules 
found so far in the Knuth-Bendix process. However we will not make use of the 
specific way in which Rules has been constructed. Instead we will write down a 
list of properties of this automaton — when we come to construct the automaton, it 
will be easy to see that the properties are either already satisfied or that it can be 
arranged for them to be satisfied. 



9.1. Properties of the rule automaton. 

1. Rules is a trim rule automaton. 

2. Rules has one initial state and one final state and they are equal. 

3. Rules and its reversal Rev{Rules) are both partially deterministic. These 
conditions imply that Rules is welded. 

4. Any arrow labelled {x,x), with source the initial state, also has target the 
initial state. Any arrow labelled (a;,x), with target the initial state, also 
has source the initial state. If either of these conditions are not fulfilled, we 
can identify the source and target of the appropriate (x, a;)-arrows, and then 
weld. We will still have a rule automaton. Later on (see Lemmas 9^ and 3.5 ) 
we will show that (after any necessary identifications and welding) we can 
omit such arrows without loss, and, in fact, with a gain given by improved 
computational efficiency. After proving these lemmas, we will assume there 
are no arrows labelled {x^x) with source or target the initial state of Rules. 

Since Rules is a rule automaton. Proposition ^.4| shows that each accepted pair 
(m, v) S L{Rules) gives a valid identity u = v in G. 

The automaton Rules may accept pairs {u, v) such that u is shorter than v. 
We cannot consider such a pair as a rule and so we want to exclude it. To this 
end we introduce the automaton SL2. This is a five state automaton, depicted in 
Figure ^, which accepts pairs (u, v) G A* x A* , such that u and v have no common 
prefix, u is shortlex-gTeater than v and |v| < |u| < \v\ + 2. By combining SL2 with 
Rules, we obtain a regular set of rules Set{Rules), which is possibly infinite, namely 
L{Rules)r\L{SL2). An automaton accepting this set can be constructed as follows. 
Its states are pairs (s,i), where s is a state of Rules and t is a state of SL2. Its 
unique initial state is the pair of initial states in Rules and SL2. A final state is 
any state (s, t) such that both s and t are final states. Its arrows are labelled by 
{x,y), where x £ A and y G Such an arrow corresponds to a pair of arrows, 
each labelled with {x,y), the first from Rules and the second from SL2. 



9.2. Restrictions on relative lengths. The restriction |7i| < |w| + 2 needs some 
explanation. The point is that if we have a rule with |u| > |w| + 2, then we have 
an equality u — v in G. We write u = u'x, where x £ A. The formal inverse X of 
X is also an element of A. We therefore have a pair (it', vX) which represent equal 
elements in G. If our set of rules were to contain such a rule, then u = u'x would 
reduce to vXx, and this reduces to v, making the rule {u, v) redundant. This leads 
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{x,y) 




Figure 3. The automaton SL2. Solid dots represent final states. 
Roman letters represent arbitrary letters from the alphabet A and 
the labels on the arrows indicate multiple arrows. For example, 
from state 2 to itself there is one arrow for each pair in A x A. 



to an obvious technique for transforming any rule we find into a new and better 
rule with \v\ < \u\ < \v\ + 2. Since we take this into account when constructing the 
automaton Rules, we are justified in making the restriction. 

This analysis can be carried further. Let u = ui ■ ■ ■ Ur+2 = u'ur+2 = uiu" and 
let V = vi ■ ■ -Vr- If ui > vi, then the rule {u,v) can be replaced by the better 
rule {u',vu^^2)- If ""2 > u^^, then {u,v) can be replaced by {u" ,u^^v). We do in 
fact carry out these steps when installing new rules, but we have not so far tried 
adjusting the finite state automaton SL2 accordingly to see what effect this would 
have on the whole process. 

9.3. Rules for which no prefix or sufHx is a rule. At the moment, it is possible 
for an element (it, w)+ of Set{Rules) to have a prefix or suffix which is also a rule. 
This is undesirable because it makes the computations we will have to do bigger 
and longer without any compensating gain. 

Recall that the automaton recognizing Set{Rules) is the product of Rules with 
SL2, the initial state being the product of initial states and the set of final states 
being any product of final states. 

We remove from Rules any arrow labelled {x, x) from the initial state to itself. 
We then form the product automaton, as described above, with two restrictions. 
Firstly, we omit any arrow whose source is a product of final states. Secondly, we 
omit completely the state and all arrows whose source or target is the state with 
first component equal to sq, the initial state of Rules, and second component equal 
to state 3 of SL2 (see Figure ||). We call the resulting automaton Rules'. 

Lemma 9.4. The language accepted by Rules' is the set of labels of accepted paths 
in the product automaton, starting from the product of initial states and ending at a 
product of final states, such that the only states along the path with first component 
equal to sq are at the beginning and end of the path. 

Proof. First consider an accepted path a in Rules' . The only arrows in Rule^ with 
source having first component Sq are those with source the product of initial states. 
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In SL2 it is not possible to return to the initial state. It follows that a has the 
required form. 

Conversely any such path in the product automaton also lies in Rules' because 
it avoids all omitted arrows. □ 

Lemma 9.5. The language accepted by Rules' is the subset of Set(Rules) which has 
no proper suffix or proper prefix in Set{Rules) . 

Proof. If a is an accepted path in Rules' , then it is clearly in Set{Rules). Moreover 
if it had a proper suffix or proper prefix which was in Set{Rules), there would be 
a state in the middle of a with first component sq. We have seen that this is 



impossible in Lemma 9.4 



Conversely, we must show that if a is an accepted path in the product automaton 
such that no proper prefix and no proper suffix of a would be accepted by the 
product automaton, then no state met by a, apart from its two ends, has sq as a 
first component. Let a = ((sq, 1), ("i, "i), 9i, ■ • ■ , (w„, f„), g„). 

First suppose ui < vi. Since a is accepted by SL2, we must have w„ = $. 
Let r < 71 be chosen as large as possible so that the first component of qr is sq. 
Then (u^+i, fr+i) • • ■ (wn, Vn) will be accepted by Rules and will be accepted by SL2 
because u„ = $. Since this cannot be a proper suffix of a by assumption, we must 
have r ~ 0. Hence qi has a first component equal to sq if and only if i — ot i — n. 

Next note that we cannot have ui = vi. This is because there is no arrow 
labelled in SL2 with source the initial state, so a would not be accepted 

by the product automaton. 

Now suppose that ui > vi and let r > be chosen as small as possible, so that 
the first component of qr is sq. Since ui > wi, the second component of q^. will be 
a final state (see Figure Since a has no accepted proper prefix, we must have 
r = n. Hence qi has a first component equal to sq if and only if i — or i — n. 

So we have proved the required result for each of the three possibilities. □ 

Let w = xi ■ ■ ■ Xn & A* be a string which we wish to reduce to a Set(i?u/es)- 
irreducible. It is important for this to be done quickly, as it has been observed by 
many people that the Knuth-Bendix process for strings spends most of its time 
reducing. Reduction needs to be carried out during critical pair analysis. 

Reduction with respect to Set{Rules) is done in a number of steps. First we find 
the shortest reducible prefix of w, if this exists. Then we find the shortest suffix of 
that which is reducible. This is a left-hand side of some rule in Set{Rules). Then 
we find the corresponding right-hand side and substitute this for the left-hand side 
which we have found in w. This reduces w in the shortlex-order . We then repeat 
the operation until we obtain an irreducible string. The process will be described 
in detail in this section and in the subsequent two sections. There is an outline of 
the reduction process in 

Our objective in this section is to find the shortest reducible prefix of w, if this 
exists. To achieve this, we must determine whether w contains a substring which 
is the left-hand side of rule belonging to Set{Rules) . 



Let Rules" be the automaton obtained from Rules' (see Lemmas 9.4 and 9.5) by 
adding arrows labelled {x, x) from the initial state to the initial state. 

We construct an NFA Rble^ (Rules) in one variable by replacing each label of the 
form (x, y) on an arrow of Rules" by x. Here x G A and y G A~^ . The name of the 
automaton RMcn (Rules) refers to the fact that the automaton accepts reducible 
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strings, and does so non-deterministically. We obtain an NFA with no e-arrows. 
However there may be many arrows labelled x with a given source. Let \.WS(Rules) 
be the regular language of left-hand sides of rules in Set(i?M/es) such that no proper 
prefix or proper suffix of the rule is itself a rule. 

Lemma 9.6. yl*.LHS(i?M/es) = L{RbleN (Rules)). 

Proof. Because of the extra arrows labelled (x, x) from initial state to initial state, 
inserted into Rules", the inclusion A* .LHS{Rules) C L{RbleN (Rules)) is clear. 

If u is accepted by Rble^ (Rules) , there is a corresponding pair (u,v) accepted 
by Rules" . We find a maximal common prefix p of u and v, so that u = pu' and 
V = pv' . Rules" remains in the initial state while reading (p,p). Since the initial 
state of SL2 is not a final state, (u' , v') must be non-empty. Since there is no way 
of returning to the initial state of SL2, once Rules" starts reading (u',v'), it can 
never return to the initial state, and therefore (u',v') must be accepted by Rules'. 
Therefore u' S LHS(Rules), as claimed. □ 



To find the shortest reducible prefix of a given string w we could feed w into the 
FSA Rhlejq (Rules) . However, reading a string with a non-deterministic automaton 
is very time-consuming, as all possible alternative paths need to be followed. 

For this reason, it may at first sight seem sensible to determinize the automa- 
ton. However, determinizing a non-deterministic automaton potentially leads to an 
exponential increase in size. The states of the determinized automaton are subsets 
of the non-deterministic automaton, and there are potentially 2" of them if there 
were n states in the non-deterministic automaton. By trying examples, we have 
observed that the theoretical exponential blow-up in this construction is sometimes 
a practical reality for the automaton Rble^ (Rules) . 

For this reason, we use a lazy state-evaluation form of the subset construction. 
The lazy evaluation strategy (common in compiler design — see for example cal- 
culates the arrows and subsets as and when they are needed, so that a gradually in- 
creasing portion P(Rules) of the determinized version Rbleo (Rules) of Rble^ (Rules) 
is all that exists at any particular time. 

Lazy evaluation is not automatically an advantage. For example, if in the end 
one has to construct virtually the whole determinized automaton RbleD(Rules) in 
any case, then nothing would be lost by doing this immediately. In our special sit- 
uation, lazy evaluation is an advantage for t wo re asons. First, during a single pass 
of the Knuth-Bendix process (see Paragraph 2.1l| ), only a comparatively small part 
of the determinized one- variable automaton Rble£)(Rules) needs to be constructed. 
In practice, this phenomenon is particularly marked in the early stages of the com- 
putation, when the automata are far from being the "right" ones. Second, this 
approach gives us the opportunity to abort a pass of Knuth-Bendix, recalculate on 
the basis of what has been discovered so far in this pass, and then restart the pass. 
If an abort seems advantageous early in the pass, very little work will have been 
done in making the structure of the determinized version of Rbleo(Rules) explicit. 

We now describe the details of this strategy. 



At the start of a Knuth-Bendix pass (see Paragraph 2.11) we let P(Rules) be 



the one-variable automaton consisting of a single non-final state containing only 
the ordered pair of initial states of Rules and SL2. At a subsequent time during the 
pass, P(Rules) may have increased, but it will always be a portion of Rble d (Rules) . 
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Suppose now that we wish to find the shortest prefix of the string w — xi ■ ■ ■ Xn G 
A* which is Set(i?M/es)-reducible. Suppose that sq, si, . . . , Sfe are states of P {Rules), 
where < A: < n — 1, that sq is the start state of P{Rules), and that, for each i 
with 1 < i < A:, we have /z(si_i,Xi) = Si. Suppose that the target of the arrow 
M(^fei^fe+i) is not yet defined. 

By definition, the subset construction apphed to the state Sk of P{Rules) under 
the alphabet symbol Xk+i yields the set fii{sk,Xk+i) as follows. For each {s',t') G 
Sfc, we look for all arrows in RbleN{Rules) labelled Xk+i with source {s',t'). If (s,t) 
is the target of such an arrow, then {s,t) is an element of fJ-iisk, Xk+i)- Note that 
this subset is always non-empty, because the initial state of RMcn {Rules) is an 
element of each Si. 

In the standard determinization procedure one would now look to see whether 
there is already a state Sk+i of P{Rules) which is equal to /ii(sfe, Xfc+i). If not one 
would create such a state Sk+i- One would then insert an arrow labelled Xi+i from 
Sfc to Sfe+i, if there wasn't already such an arrow. A new state is defined to be a final 
state of P{Rules) if and only if the subset contains a final state of RbleN {Rules) . 

Of course, one does not need to determine the subset ^i{sk, Xk+i) if there is 
already an arrow in P{Rules) labelled 0:^4.1 with source Sk, because in that case the 
subset is already computed and stored. 

In our procedure we improve on the procedure just described. The point is that 
fJ-i{ski Xk+i) may contain pairs which are not needed and can be removed. From 
a practical point of view this has the advantage of saving space and reducing the 
amount of computation involved when calculating subsequent arrows. Specifically, 
we remove a pair (p, q') from yUi(sfe, Xfc+i) if q' is state 3 of SL2 (see Figure ^) and 
fj.i{sk,Xk+i) also contains the pair {p,q) where g is state 2 of SL( 5, A). Removing all 
such pairs {p,q') yields the set /L(p(sfc, ife+i) and we add the corresponding arrow 
and state to P {Rules), creating a new state if necessary. We make the state a 
final state if the subset contains a final state of Rble^ {Rules) . The validity of this 
modification follows from Theorem 4.1, and we see that some prefix of w arrives at 
a final state of P{Rules) if and only if w is Set(i?MZes)-reducible. 

When finding the corresponding left-hand side of a rule inside w, we need never 
compute beyond a final state of P{Rules). As a space-saving and time-saving mea- 
sure our implementation therefore replaces each final state of P{Rules), as soon as 
it is found, by the empty set of states. As remarked above, the standard deter- 
minization of RbleN{Rules) never produces an empty set of states, and so there is 
no possibility of confusion. 

Reading w can be quite slow if many states need to be added to P{Rules) while 
it is being read. However, reading w is fast when no states need to be built. In 
practice, fairly soon after a Knuth-Bendix pass starts, reading becomes rapid, that 
is, linear with a very small constant. 



10. Finding the left-hand side in a string 
We retain the hypotheses of Section ^. Namely, we have a two-variable au- 



tomaton Rules satisfying the conditions of Paragraph 9.1. We are given a word 



w = xi ■ ■ ■ Xn, and we wish to reduce it. In the previous section we showed how 
to find the minimal reducible prefix w' = xi ■ ■ ■ Xm of w with respect to the rules 
implicitly specified by Rules. We now wish to find the minimal suffix of w' which is 
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a left-hand side of some rule in Set{Rules) . The procedure is quite similar to that 
of the previous section. 

We form the two- variable automaton Rev{Rules), which we combine with Rev{SL2). 
The first automaton is, by hypothesis, partially deterministic. If we determinize the 
second automaton, we obtain another PDFA. Figure ^ shows the determinization 
of Rev{SL2), where the subsets of states of SL2 are explicitly recorded. 



i^Jj) (x,v) 




{x,x) {x,y) 

X <y 



Figure 4. This PDFA arises by applying the accessible subset 
construction to Rev{SL2) in the case where the base alphabet has 
more than one element. Each state is a subset of the state set of 
Rev{SL2) and final states have a double border. This PDFA, when 
reading a pair {u, v) from right to left, keeps track of whether u is 
longer than v or not, which it discovers immediately since padding 
symbols if any must occur at the right-hand end of v. Note that 
this automaton is minimized. 



We take the product of the two automata Rev{Rules) and Rev{SL2). A new state 
is a pair of old states. An arrow is a pair of arrows with the same label {x, y). The 
initial state in the product is the unique pair of initial states. A final state in the 
product is a pair of final states. 

To form the one- variable non-deterministic automaton Rev]\[{LHS (Rules)) with- 
out e-arrows, we use the same states and arrows as in the product automaton, but 
replace each label of the form {x, y) in the product automaton by the label x. The 
deterministic one-variable automaton Rev d{LHS {Rules)) can then be constructed 
using the subset construction. 

We have gone through the above description to give the reader a theoretical 
understanding of what is going on before going into details. Also, our procedure 
may be adaptable to related situations which are not identical to this one. In fact, 
we use not the construction just described, but a related construction which we 
describe below. The point of what we do may not become fully apparent until we 
get to Section 

10.1. Reversing the rules. We first describe a two- variable PDFA M which ac- 
cepts exactly the reverse of each rule (A, p)+ in Set{Rules) such that no proper suffix 
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and no proper prefix of (A, p)^ is in Set{Rules) (cf. Lemma 9.5). We assume that 
we have a two- variable automaton Rules satisfying the conditions of Paragraph |9.l[ 
A state of M is a triple (s, i, j), where s is a state of Rev{Rules), i e {0, 1, 2} and 
J G {+, — }. M has a unique initial state (sq, 0, +) where sq is the unique initial state 
of Rev{Rules). In addition, M has three final states /o = (sq, 0, — ), /i — (sq, 1, — ) 
and /2 = (so, 2, — ). We do not allow states of M of the form {so,i,j), except for 
the initial state and the three final states just mentioned. We will construct the 
arrows of M to ensure that any path of arrows accepted by M has first component 
equal to Sq for its initial state and its final state and for no other states. (Compare 



this with Lemma 9.4. 



The intention is that in a state {s,i,j), i represents the number of padded sym- 



bols occurring in any path of arrows from the initial state of M to {s,i,j). By 9.2 
the padded symbols must be of the form (x, $), where x & A. Furthermore, there 
are zero, one or two padded symbols in any rule, and, if padded symbols appear, 
they are at the right-hand end of a rule. This means that they are the first symbols 
read by M. 

The J component is intended to represent whether an arrow with source {s,i,j) 
is permitted with label a padded symbol. We take j = + if a padded symbol is 
permitted, and j = — if a padded symbol is not permitted. 

The following conditions determine the arrows in M. 

1. Each arrow of M is labelled with some {x, y), where x G A and y € A'^. 

2. (s, is defined if and only ii 1) t = s(^'*) is defined in Rev{Rules), and 
2a) {s,i,j) = (sq,0, the initial state, or 2b) — (1,+). In case 2a) 
the target is (t, 1,+), unless t is the final state of Rev{Rules), in which case 
the target is /i — {sq,1,—). In case 2b), the target is (i, 2,— ), which may 
possibly be equal to /2. The final state /i arises in case 2a) when we have 
a rule (x,e), which means that the generator x of our group represents the 
trivial element. The final state /2 arises in case 2b) when we have a rule 
{xiX2, e). This kind of rule arises when xi and X2 are inverse to each other, 
usually formal inverses. 

3. There are no arrows with source fi. 

4. Suppose (s, is not a final state. Then (s, with x,y £ A is defined 
if and only if 1) t = s(^^J') is defined in Rev{Rules), and 2) if f = sq then 2a) 
i = and a; > y or 2b) z > and x ^ y. We then have (s, = {t, i, — ). 
This condition corresponds to the requirement that (u, v) can only be a rule if 
a) u and v have the same length and ui > vi, where these are the first letters 
of u and v respectively, or b) if u is longer than v and ui ^ vi. 



Lemma 10.2. The language accepted by M is the set of reversals of rules (A, p)~^ G 
Set{Rules) such that no proper suffix and no proper prefix of (A, p)^ is in Set{Rules) . 

The proof of this lemma is much the same as the proofs of Lemmas ^.4| and |9.5[ 
We therefore omit it. 

Using the above description of M , we now describe how to obtain a non-deterministic 
one- variable automaton Rev]\[{LHS (Rules)) from M in an analogous manner to that 
used to obtain RhleM{Rules) from Rules^' in Section ||. Rev^ (LHS (Rules)) accepts 
reversed left-hand sides of rules in Set(Rules) which do not have a proper prefix 
or a proper suffix which is in Set{Rules). RevN{LHS (Rules)) has the same set of 
states as M and the same set of arrows. However, the label (x, y) with x £ A and 
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y E j4+ of an arrow in M is replaced by the label x in Rev^^ {LHS (Rules)) The 
two automata, N and RevN (LHS (Rules)) , have the same initial state and the same 
final states. Hence Rev n(LHS (Rules)) accepts all reversed left-hand sides of rules 
whose reversals are accepted by M . 

The one-variable automaton Q(Rules) is formed from Rev m(LHS (Rules)) by a 
modified subset construction, using lazy evaluation. Q(Rules) is part of the one- 
variable PDFA Rev D (LHS (Rules)) , the determinization of Rev ^(LHS (Rules)). As 
we shall see, a string is accepted by Q(Rules) only if its reversal A is the left-hand 
side of a rule in S&t(Rules) and no proper substring of A has this property. 

Note 10.3. In order to construct states and arrows in Q(Rules), one only needs 
to have access to Rev(Rules), that is, neither M nor Rev ^(LHS (Rules)) has to be 
explicitly constructed. 



10.4. The algorithm for finding the left-hand side. Suppose we have a string 
xi - ■ ■ Xn G A* and we know it has a suffix which is the left-hand side of some rule 
in Set(Rules). Suppose no proper prefix of xi • • ■ Xn has this property. We give an 
algorithm that finds the shortest such suffix. 

We read the string from right to left, starting with Xn- We assume that Xk+iXk+2 ■ ' 
has been read so far and that as a result the current state of Q(Rules) is 5*^, where 
Sk is a state oiQ (Rules) (so Sk is a subset of the set of states of Rev^ (LHS (Rules))) . 

We start the algorithm with k ^ n and the current state of Q(Rules) equal to 
the singleton {(so,0, +)} whose only element is the initial state of M, where sq is 
the initial state of Rev(Rules). Q(Rules) has three final states, namely the singleton 
sets {/J for i = 0,1,2. 

The steps of the algorithm are as follows: 

1. Record the current state as the fc-th entry in an array of size n, where n is 
the length of the input string. 

2. If the current state is not a final state, go to Step 10.4.3. If the current state 
is a final state, then stop. Note that the initial state of Q (Rules) is not a final 
state, so this step does not apply at the beginning of the algorithm. If the 
current state is a final state, then the shortest suffix of Xi ■ ■ ■ Xn which is the 
left-hand side of a rule in Set(Rules) can then be proved to be Xk+iXk+2 ■ ■ ■ Xn- 

3. If the arrow labelled Xk with source the current state is already defined, then 
redefine the current state to be the target of this arrow and decrease k by 
one. 

4. If the preceding step does not apply, we have to compute the target T of the 
arrow labelled Xk with source the current state St- We do this by looking 
for all arrows labelled Xk in Rev n(LHS (Rules)) with source in 5*^, and define 
T to be the set of all targets. Note that this set of targets cannot be empty 
since we know that some suffix oi xi ■ ■ ■ Xn is accepted by Rev^ (LHS (Rules)) . 

5. There are two modifications which we can make to the previous step. 

(a) Firstly, if the set of targets contains some final state fj , then we look for 
the largest value of i = 0, 1, 2 such that fi €T and redefine T to be {/i}. 
We then insert into Q(Rules) an arrow labelled Xk from Sk to this final 
state. If we have found that T is a final state, we set Sk-i equal to T, 



decrease k by one, and go to Step 10.4.1 



(b) Secondly, if, while calculating the set T, we find that a state s of Rev(Rules) 
occurs in more than one triple (s, i, j) , then we only include the triple with 
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the largest value of i. For this to be well-defined, we need to know that 
{s,i,-\-) and (s,i,— ) cannot both come up as potential elements of T — 



this is addressed in the proof of Theorem 10.5 along with justifications of 

the other modifications. 
Having found T, see if it is equal to some state T' of Q (Rules) which has 
already been constructed. If so, define an arrow labelled Xk from S to T'. 
If T has not already been constructed, define a new state of Q (Rules) equal 
to T and define an arrow labelled Xk from S to T. 



8. Set the current state equal to T and decrease k by one. Then go to Step 10.4.1 



Theorem 10.5. Suppose xi ■ ■ ■ Xn has a suffix which is the left-hand side of a rule 
in Set(Rules) and suppose no prefix o/ xi • ■ • x„ has this property. Then the above 
algorithm correctly computes the shortest such suffix. 



Proof. We first show that the modification in Step |10.4.5.b| is well-defined in the 
sense that triples (s, i, +) and (s, i, —) cannot both occur while calculating T. The 
reason for this is that the third component can only be -I- if either none of xi • • • a;„ 
has been read, in which case the only relevant state is (sqjO, or else only a;„ 
has been read, in which case the possible relevant states are (/, 1, — ), (s, 1, with 
s 7^ /, and (s,0, — ). So a state of the form (s, with a given s occurs at most 
once in a fixed subset with the maximum possible value of i. 



The effect of Step 10. 4. 5. a in the above algorithm is to ensure that termination 
occurs as soon as a final state of Rev(Rules) appears in a calculated triple. Since we 
know that xi ■ ■ ■ Xn contains a left-hand side of a rule in Set(Rules) as a suffix we 
need only show that the introduction of Step 10.4.5.t does not affect the accepted 



language of the constructed automaton. This will be a consequence of Theorem 4.1 
as we now proceed to show. 

Consider a triple t — (s,i,j) arising during the calculation of a subset T, and 
suppose that s is a non-final state of Rev(Rules). If j = -I- then T cannot contain 
both (s,0,+) and (s,l,+) and so t will not be removed from T as a result of 
Step 10.4.5.1:. Therefore we only need to consider the case j ~ — . For fc = 0, 1, 2, 
let Lk C A* X A* be the language obtained by making (s,k, — ) the only initial 
state of M , and observe that there can be no padded arrows in any path of arrows 
from (s, fc, — ) to a final state of M . Now by considering the definition of the non- 
padded transitions in M given in 10.1.4 , it is straightforward to see that Lq C 
Li ~ L2. Therefore, since Rcvn (LHS (Rules)) has no e-arro ws, we have just shown 
that the hypo theses of Theorem 4T apply to Step 10.4.5.b| . Hence the omission in 
Step 10.4.5.b| does not affect the accepted language of Q (Rules). □ 



As with P(Rules), reading a word into Q(Rules) from right to left can be slow in 
the initial stages of a Knuth-Bendix pass, but soon speeds up to being linear with 
a small constant. 



11. Finding the right-hand side of a rule 

We retain the hypotheses of Section Namely, we have a two- variable rule 
automaton Rules which is welded and satisfies various other minor conditions. We 
are given a word w — xi- ■ - Xn, and we wish to reduce it relative to the rules 
implicitly contained in Rules. So far we have located a left-hand side A which 
is a substring of w. In this section we show how to construct the corresponding 
right-hand side. 
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We first go into more detail as to how we propose to reduce w. In outline we 
proceed as follows. 



11.1. Outline of the reduction process. 

1. Feed w one symbol at a time into the one- variable automaton P{Rules) de- 
scribed in Section ^, storing the history of states reached on a stack. 

2. If a final state is reached after some prefix u oiw has been read by P(Rules), 
then u has some suffix which is a left-hand side. Moreover, this procedure 
finds the shortest such prefix. 

3. Feed u from right to left into Q{Rules). A final state is reached as soon 
as Q{Rules) has read the shortest suffix A of m such that there is a rule 
(A,p) G Set{Rules). We now have u = pX and w = pXq, where p,q d A*, 
every proper prefix of pA and every proper suffix of A is Set(iZMZes)-irreducible. 



4. Find p, the smallest string such that there is a rule {X, p) in S (see 2.11). If 
there is no such rule in S, find p by a method to be described in this section, 
such that p is the smallest string such that (A, p) G Set{Rules). 

5. If (A, p) is not already in S, insert it into the part of S called New. 

6. Replace A with p in w and pop |A| levels off the stack so that the stack 
represents the history as it was immediately after feeding p into P{Rules). 

7. Redefine w to be ppq. Restart at Step 1 as though p has just been read and 
the next letter to be read is the first letter of p. The history stack enables 
one to do this. 

Note that other strategies might lead to finding first some left-hand side in w 
other than A. Moreover, there may be several different right-hand sides p with 
(A, p) e Set{Rules). A rule (A,p) in Set{Rules) gives rise to paths in Rules, SL2 
and Rev£){SL2). We will find the path for which right-hand side p is shortlex-lea&i, 
given that the left-hand side is equal to A. 

Let A = 2/1 • • -ym- Recall that a state of the one-variable automaton Q{Rules) 
used to find A is a set of states of the form (s, i, j), where s is a state of Rules, i G 
{0, 1, 2} and j G {+, — }. Wh en find ing A wc kept the history of states of Q{Rules) 



which were visited — see Step |10.4.1[ . Let Qk be the set of triples {s,i,j) comprising 
the state of Q(Rules) after reading the string yk+i ■ ■ ■ Vm from right to left. Qo = 
{fi} — {{sQ,i, —)} where sq is the unique initial and final state of Rules, and i is 
the difference in length between A and the p that we are looking for. 



11.2. Right-hand side routine. Inductively, after reading yi ■ ■ - yk we will have 
determined zi - ■ ■ Zk, the prefix of p. Inductively we also have a triple {sk,ik, jk), 
where s is a state of Rules, ik is or 1 or 2 and jk is -|- or — . Note that we always 
have m — k > ik- 

1. If m — fc = ik, then we have found p = zi ■ ■ ■ Zk and we stop. So from now on 
we assume that m > ik + k. This means that the next symbol (j/jt+i, Zfe+i) of 
{X, p) does not have a padding symbol in its right-hand component. 

2. We now try to find Zk+i by running through each element z ^ A in increasing 
order. Set z equal to the least element of A. 

3. If fc = and iq = 0, then A and p will be of equal length, so the first symbol 
of (A,p) must be {yi,zi), where yi > z\. So at this stage we can prove that 
we have y\ > z, since we know that there must be some right-hand side 
corresponding to our given left-hand side. 
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4. 



If fc = and io > 0, then the first symbol of (A, p)^ is {yi, zi) with zi E A 
and yi ^ zi. If A: = 0, io > and yi = z, we increase z to the next element 
of A. 

Here we are trying out a particular value of z to see whether it allows us to 



get further. We look in Rules to see if s 



is defined. If it is not 



defined, we increase z to the ne xt element of A and go to Step 11.2.3| . 
If Sk+i is defined in Step 11.2.4 , we look in Qk+i for a triple (sfc+i, ik+i,jk+i) 
which is the source of an arrow labelled (j/fc+i, z) in the automaton M. Recall 
that M was defined in Section ^ Note that, by the proof of Theorem 10.5| , 
Qk+i contains at most one element whose first coordinate is Sk+i- As a result, 
the search can be quick. 

If {sk+i,ik+i, jk+i) is not found in Step 11.2.5, increase z to the next element 



of A and go to Step 11.2.3. 

If (sfc+i Zfc+i, jfc+i) is found in Step |ll.2.4 set Zk^ 



to Step 11.2.1 



increase k and go 



The above algorithm will not hang, because each triple (skTik^jk) that we use 
does come from a path of arrows in M which starts at the initial state of M and 
ends at the first possible final state of M. Therefore all possible right-hand sides 
p such that (A, p) G Set{Rules), are implicitly computed when we record the states 
of Q{Rules) (see Step 10.4.1). Since ik does not vary during our search, we will 
always find the shortest possible p, with |A| — \p\ being equal to this constant value 
of ik. Since we always look for z in increasing order, we are bound to find the 
lexicographically least p. 

We remind the reader that an overview of the entire reduction process for a given 
string w is given in |11.1| . 



12. Our version of Knuth-Bendix. 

For finite Noetherian rewriting systems the question of confluence is decidable by 
the critical pair analysis described in Section |^. However, for infinite Noetherian 
rewriting systems the confluence question is, in general, undecidable. Examples 
exhibiting undecidability are given in [^Oj and are length-reducing rewriting systems 
R which are regular in the sense that R contains only a finite number of right-hand 
sides and for each right-hand side r, the set {/ : {l,r) G i?} is a regular language. 
These examples are in the context of rewriting for monoids. As far as we know, 
there is no known example of undecidability if we add to the hypothesis that the 
monoid defined by R is in fact a group. 

In this section we consider a rewriting system which is the accepted language of 
a rule automaton for some finitely presented group. We describe a Knuth-Bendix 
type algorithm for such a system. In light of the undecidability result mentioned 
above, our algorithm does not provide a test for confluence. We can however use 
our algorithm together with other algorithms for dealing with s/iortZe2;-automatic 
groups, to prove confluence by an indirect route if the group is s/iortfex-automatic. 
Details of the theory of how this is done can be found in ||] . The practical details 
are carried out in programs by Derek Holt — see |^ . 

Suppose throughout that G is a monoid given by a finite presentation {A/R), 
where ^ is a set of generators for G with a fixed total ordering < and i? is a 
finite set of equalities. The monoid is defined by the congruence generated by these 
equalities. We will assume that there is an involution l : A ~* A (which will send 
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each generator to its formal inverse) such that, for each element x A, there are 
equalities in R of the form xl(x) = e and l{x)x — e. This implies that G is a group. 
The equalities in R can be regarded as a finite set of rules which define G. 

In our algorithm, we keep two sets of rules. One set, which we call S, is a finite 
set of rules. The other is a possibly infinite set of rules which is kept implicitly 
in a rule automaton called Rules. When we want to specify that we are working 
with the Rules automaton during the nth Knuth-Bendix pass (see 2.11 for the 
definition of a Knuth-Bendix pass), we will use the notation Rules[n]. We extract 
explicit rules from Rules[n\ by taking elements of the intersection Set{Rules[n\) = 
L{Rules[n]) n L{SL2). The two-variable automaton SL2 was defined in Section || 
and is depicted in Figure ^. 

S will change almost continually, while Rules is constant during a Knuth-Bendix 
pass. We do in fact need to change Rules from time to time, and we do so as the 
last step of each Knuth-Bendix pass. We will perform the Knuth-Bendix process. 



using the rules in S for critical pair analysis, as described in 2.4 



12.1. Rapid reduction. A difference between our situation and that of classical 
Knuth-Bendix is that reduction is not carried out by applying the rules of S. When 
running Knuth-Bendix, one of the most time-consuming aspects is reduction. This 
is partly because there is a lot of reduction to be done and partly because one 
normally has to spend a long time looking through a long list of rules to see if the 
string one is trying to reduce contains a left-hand side of some rule. Much of the 
effort in producing new Knuth-Bendix algorithms, like the algorithm described in 
this paper, goes on finding methods of locating relevant rules quickly. In the past 
this has involved using structures which use a lot of space. In our procedure we 
use the method described in [ll.l| to find relevant rules quickly without using an 
inordinate amount of space. We refer to this as R-reduction. We also use the terms 
R-reduce and its various derivatives. R stands for "relation", for "reduction" and 
for "rapid". 

Note 12.2. Note that a string is R-reducible at one point in a Knuth-Bendix pass 
if and only if it is R-reducible at another point in the same pass. However, as we 
shall see, the result of R-reduction may change during a pass, because S changes. 



12.3. The basic structures. The basic structures used in our procedure are: 



1. A two- variable automaton Rules satisfying the conditions laid down in 9T 



2. A finite set S of rules, which is the disjoint union of several subsets of rules : 
Considered, This, New and Delete. 

3. Considered is a subset of S such that each rule has already been compared with 
each other rule in Considered, including with itself, to see whether left-hand 
sides overlap. The consequent critical pair analysis has also been carried out 
for pairs of rules in Considered. Such rules do not need to be compared with 
each other again. 

4. This is a subset of S containing rules which we plan to use during this pass 



to compare for overlaps with the rules in Considered, as in 2.5. These rules 



have been minimized during the current pass (see |12.7D and so should not be 
minimized again. 

5. New is a subset of S containing new rules which have been found during the 
current pass, other than those which are output by the minimization routine 
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(see 12.7). Non-trivial rules which are the final output of the minimization 
routine are added to This. 

Delete is a subset of S containing rules which are to be deleted at the beginning 
of the next pass. 

A two-variable automaton WDiff which contains all the states and arrows of 
Rules[rL\, and possibly other states and arrows. It satisfies the conditions of 



9.1 



12.4. Initial arrangements. Before describing the main Knuth-Bendix process, 
we explain how the data structures are initially set up. Recall that R (which should 
be distinguished from R) is the original set of defining relations together with special 
rules of the form {xb{x),e) and (t(a;)x, e) which make the formal inverse l{x) into 
the actual inverse of x. 

We rewrite each non-special element of R in the form of a relator, which we 
cyclically reduce in the free group. Since i{x) is the formal inverse of the letter x, 
we are able to write down the formal inverse of any string in A* . We may therefore 
assume that each relator has the form lr~^ , where I and r are elements of A* and 
(Z,r) is accepted by SL2. 

For each rule (Z,r), including the special rules, we form a rule automaton, as 



explained in Example B.2. These automata are then welded together to form the 



two- variable rule automaton WDiff satisfying the conditions of 9.1. Each state and 
arrow of WDiff is marked as needed. (At certain well-chosen moments we will delete 
from WDiff states and arrows that are not needed). Each of these rules is inserted 
into This. Considered, New and Delete are initially empty. Set Rules[\\ = WDiff. 

12.5. The main loop — a Knuth— Bendix pass. A significant proportion of the 
time in a Knuth-Bendix pass is spent in applying a procedure which we term 
minimization. Each rule encountered during the pass is input to this procedure 
and the output is called a minimal rule. The exact details of this process are 
given in sections 12. 6| and 12.9, but we point out that minimization often results 



in rules being added to and/or deleted from S. Any rules added to S during the 



minimization of a rule (A, p) are strictly smaller than (A, p) in the ordering of 2.9 



At this point. This is empty. If n > 0, save space by deleting previously 
defined automata P{Rules[n])^ Q{Rules[n\) and Rules[n\. Increment n. The 
integer n records which Knuth-Bendix pass we are currently working on. 

Delete the rules in Delete. 

For each rul e (A, p) in Considered, minimize (A,/?) as in 12.7 and handle the 



5. 



12.9 



output as m 

For each rule (A, p) in New, minimize (A,/9) as in 12.7 and handle the output 
as in 



12.g. 



Since rules added to New during minimization are always strictly smaller 
than than the rule being minimized, it follows that eventually each rule in 
New will be processed; that is, the process of examining rules in New does 
not continue indefinitely. 
For each rule (A,p) in This: 

(a) Delete the rule from This and add it to Considered. 

(b) For each rule (Ai,pi) in Considered: 
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Look for overlaps between A and Ai. That is we have to find 
each suffix of A which is a prefix of Ai and each suffix of Ai 
which is a prefix of A. Note that we may have to allow A = Ai 
in order to deal with the case where two different rules have the 
same left-hand side. In this case, both the prefix and suffix of 
both left-hand sides is equal to A = Ai. Then R-reduce in two 



different ways as in 2.5, obtaining a pair of strings {u, v). If they 
differ then rearrange them so that u > v and insert the result 
into New, unless it is already in S. 
6. Delete from WDiff all arrows and states which are not marked as needed. 
Copy WDiff into Rules[n + 1] and mark all arrows and states of WDiff as 



not needed. The details of this step are given in 12. IC 



7. This ends the description of a Knuth-Bendix pass. Now we decide whether 
to terminate the Knuth-Bendix process. Since we know of no procedure 
to decide confluence of an infinite system of rules (indeed, it is probably 
undecidable), this decision is taken on heuristic grounds. In our context, a 
decision to terminate could be taken simply on the grounds that WDiff and 
Rules[n\ have the same states and arrows. In other words, no new word- 
diffcrcnccs or arrows between word-differences has been found during this 



pass. If the Knuth-Bcndix process is not terminated, go to 12.5.1 



12.6. Minimizing a rule. We now provide the details of the minimization routine. 

Definition 12.7. Let (w, u) G A* x A* and let u — ui- ■ - Up and v ~ vi- ■ - Vg, 
where Ui, Vj G A. We say that (u, v) is a minimal rule if u ^ v, u — v in G and the 
following procedure does not change {u,v). The procedure is called minimizing a 
rule or the minimization routine. We always start the minimization routine with 
u >SL V, though this condition is not necessarily maintained as u and v change 
during the routine. 



of u obtaining u' . Reduction 



1. R-reduce the maximal proper prefix ui 
may result in rules being added to New as described in 11.1.5. li u ^ u'up, 
change u to u'up and go to Step 12.7.3. 

2. R-reduce the maximal proper suffix U2 - ■ -Up of u obtaining u" . Reduction 
may result in new rules being added to New. Replace u by Uiu" . 

3. If u has changed since the original input to the minimization routine, then 
R-reduce u. This may result in new rules being added to New. 

4. If p > g+2 or if p = g+2 and ui > vi, replace (u, v) by (ui ■ • • Wp-i, vi - ■ ■ Vqi{up)) 
and repeat this step until we can go no further. 

5. \i p = q-\-2 and U2 > i(ui), replace (u, v) by (u2 • • • Up, i{ui)vi ■ ■ ■ Vq). 

6. If g > and ui = vi, cancel the first letter from u and from v and repeat this 
step. 

7. If g > and Up ^ Vq, cancel the last letter from u and from v and repeat this 
step. 

8. R-reduce v as explained in 



11.1.5 



9. 
10. 



as described in 



H V > u, interchange u and v. 



11.1 



This may result in rules being added to New 



If (m, v) has changed since the last time Step 12.7 A was executed, go to 
Step [12.7.4 



11. Output {u,v) and stop. 
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From Note 12.2, we see that if a rule is minimal at one time during a Knuth-Bendix 



pass then it is minimal at all later times during the same pass. 

Note that the output could be (e,e), which means that the rule is redundant. 
Otherwise we have output (u, v) with u > v. Note that the minimization procedure 



keeps on decreasing (u, v) in the ordering given by 2.9. Since this is a well-ordering, 
the minimization procedure has to stop. Also any rules added to New as a result 
of R-reduction during minimization are smaller than (u, v). 

Lemma 12.8. Let (Ai,pi) be the output from minimizing (A, p). If X has no proper 
R-reducible substrings, then Ai is a non-trivial substring of A. 

Proof. Under the hypotheses, the successive steps of minimization change A and p, 
while maintaining the inequality X > p. As a result the left-hand and right-hand 
sides are never interchanged. It follows that Ai > pi, so Ai is non-trivial. It is easy 
to see that Ai is a substring of A. □ 

12.9. Handling minimization output. Suppose the input to minimization is 
(A, p) and its output is (Ai,pi). We now describe how the rule (Ai,pi) is treated. 

1. In order to avoid unnecessary subsequent work of minimization, mark (A, p) 
as minimized and (Ai, pi) as minimal for this pass. 

2. If (Ai,pi) ^ (e, e), incorporate (Ai, p i) into the language accepted by WDiff, 
using the method described in 12.1C| . 



3. If (Ai,pi) = (A, p), that is, if (A, p) was already minimal, then: 

(a) If (A, p) was in Considered, do nothing. 

(b) If (A, p) w as in New, move it to This. 

(c) We saw in 12. 5| , that the above two situations are the only ones in which 



minimization is carried out. 

4. If (A,p)9^(Ai,pi)^(e,e), then: 

(a) If (Ai,pi) is already in S, mark the copy in S as minimal. 

(b) If (Ai,pi) is not already in S, insert it in This. 

5. If (Ai,pi) — (e,e), do nothing. 

6. If A was affected by the minimization routine before Step |12.7.4 , that is, if 



some proper substring of A was R-reducible, then delete (A,p). 
7. If, at the time of minimization, all proper substrings of A were R-irreducible 
and if (A,p) was not minimal, move (A, p) to the Delete list. The reason 
for this possibly surprising policy of not deleting immediately is that further 
reduction during this pass may once again produce A as a left-hand side by 
the methods of Sections ^ and |l^. We want to avoid the work involved in 
finding the right-hand side by the method of Sectio n [ll|. F or this, we need to 



have a rule in S with left-hand side equal to A — sec 11.1.5 



12.10. Details on the structure of WDiff. At the beginning of Step 12.5.6 , 
each state s of WDiff has an associated string Wg G A* which is irreducible with 
respect to Set{Rules[n]) . WDiff is a rule automaton: we associate the element 
to the state s. These state labels are calculated as and when new states and arrows 



are added to WDiff during a Knuth-Bendix pass (see |l2.1l| ) . 

At the end of the nth Knuth-Bendix pass, WDiff is an automaton which repre- 
sents the word-differences and arrows between them encountered during that pass. 
At this stage the string attached to each state is irreducible with respect to the rules 



30 



D. B. A. EPSTEIN AND PAUL J. SANDERS 



in Set{Rule.s[n\) but not necessarily with respect to the rules implicitly contained in 
WDiff . Before starting the next pass, we R-reduce the state labels of WDiff with 
respect to Set{WDiff). If WDiff now contains distinct states labelled by the same 
string we connect them by epsilon arrows and replace WDiff by Weld{ WDiff) (see 



remark B.6). We then repeat this procedure until all states are labelled by distinct 
strings which are irreducible with respect to Set{WDiff). If during this procedure 
a state or arrow marked as needed is identified with one not marked as needed, the 
resulting state or arrow is marked as needed. 

Whenever a minimal rule r is encountered during the nth pass, it is adjoined 
to the accepted language of WDiff . One method of doing this is to form the 



rule automaton M(r) as given in example 8.2, and replace WDiff by the result of 
welding the union of the two rule automata WDiff and M{r). However, there is 
a more efficient way to proceed which eliminates the need to construct Mir) and 
possibly avoids the necessity for welding. We call this procedure sewing. 

12.11. The sewing procedure. Suppose we have a rule r — {ui,vi) ■ ■ ■ (m„,ii„), 
with each Ui ^ A and each Vi £ which we want to add to the language accepted 
by WDiff . We read the rule into WDiff from left to right, starting at the initial 
state of WDiff . Suppose it is possible to read in [ui^vi) • ■ ■ {uk,Vk), arriving at 
a state Sk, where k is chosen as large as possible, subject to the condition that 
k < n. If fc < n, then the arrow labelled {uk+i,Vk+i) is undefined on Sk- We now 
read {uk+i,Vk+i) ■ ■ ■ iun,Vn) into WDiff, starting from the initial and final state 
So = tn and reading from right to left, arriving at a state tr with r > k, on which 
the backward arrow labelled (m^, Vr) is undefined. We mark all states and arrows 
encountered as needed. 

We now proceed as follows: 

1. If A; = r and the states Sk and tr do not coincide, join them by an epsilon- 
arrow, replace WDiff by Weld{WDiff), and then stop. When identifying 
states with different labels during the welding procedure, we choose the 
shortlex-least label for the amalgamated state. 

2. If fc < r, let Wk be the label for Sk- Reduce u^^-^^WkVk+i, obtaining Wk+i- 

3. If Wk+i is the label of an existing state t of WDiff, set Sk+i — t. 

4. If Wk+i is not the label of an existing state, create a new state Sk+i with label 

Wk+l- 

5. Create an arrow labelled {uk+i,Vk^i) from Sk to Sk+i- 

6. Mark the state Wk+i and th e arrow {uk+i,Vk+i) as needed. 



7. Increment fc and go to Step 12.11.1 



Note that the automaton obtained by sewing is a rule automaton. 

13. Correctness of our Knuth-Bendix Procedure 

In this section we will prove that the procedure set out in Section |l^ docs what 
we expect it to do. 

Definition 13.1. For a discrete time t, we denote by S(t) the rules in S at time 
t in our Knuth — Bendix procedure. We can take t to be the number of machine 
operations since the program started, or any similar discrete measure. 

Definition 13.2. A quintuple (t, si, S2, A, p), where i is a time, and si, S2, A and p 
are elements of A*, is called an elementary S{t) -reduction u — >s(t) "^^ from u to u if 
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{X, p) is a rule in S{t), u — siXs2 and v — Sips2- We call {X, p) the rule associated 
to the elementary reduction. 

Definition 13.3. Let t > 0. By a time-t Thue path between two strings wi and 
W2, we mean a sequence of elementary S(f)-reductions and inverses of elementary 
S(t)-reductions connecting wi to w^, such that none of the rules associated to the 
elementary reductions is in Delete at time t. We talk of the strings which are the 
source or target of these elementary reductions as nodes. The path is considered 
as having a direction from wi to W2. The elementary reductions will be consistent 
with this direction and will be called rightward elementary reductions. The inverses 
of elementary reductions will be in the opposite direction and will be called leftward 
elementary reductions. 

Proposition 13.4. Let (A/R) be the finite presentation at the start of the Knuth- 
Bendix procedure. Then if (A, p) € S during the n-th Knuth-Bendix pass, we have 
X p. 



Proof. The proof of this is an easy induction on n using Corollary 8.5. □ 



Proposition 13.5. Let t > and suppose that we have a Thue path from a to P 
in S{t) with maximum node w. Then for any time s > t, there exists a times Thue 
path from a to P with each node less than or equal to w. 

Proof. We show by induction on s that, if at some time t < s there is a Thue path 
between strings u and v with all nodes no bigger than u or v, then there is also 
such a Thue path at time s. So suppose that we have proved this statement for all 
times s' < s. 

We first consider the special case where (u, v) is a rule being input to the min- 



imization routine (see Definition 12.7) at time t, and s is the time at end of the 



subsequent invocation of the minimization handling routine 12.9. 

Each step of minimization takes an input string and outputs a possibly different 
string which is used as the input to the next step. The initial input is {u, v) and 
the final output is either (e, e) or a minimal rule (u' , v'). Let ri, r2, . . . , r„ be the 
sequence of outputs in the minimization of {u, v), and let rg = (u, v). By considering 
each step of minimization in turn, we will show that for each 1 < i < n, if there 
is a time-s Thue path between the two sides of with maximum node no bigger 
than either side, then there is a time-s Thue path between the two sides of r^-i 
with maximum node no bigger than either side. We then obtain the desired time-s 
Thue path between u and v by using descending induction on i given that the base 
case i = n is trivially true. 

To make the task of checking the proof easier, we use the same numbering here 
as in Definition 



12.7 



1. At the end of this step, there is a sequence of elementary reductions from 
ui . . . Up_i to u' , but this may not constitute a Thue path since some of the 
associated rules may be in Delete. However, any such rule (A, p) will, at some 
time s' < s, have been in S but not in Delete. Therefore by our induction on 
s, at the end of this step there will be a Thue path from A to p with maximum 
node A. Since no rule used in this Thue path is equal to {u,v), this will still 
be a Thue path at time s. Hence we can construct a time-s Thue path from 
u to w with maximum node u. 
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2. This step is analogous to the previous step. 

3. At the end of this step, the sequence of R-reductions of u to the current left- 
hand side does not use the rule {u, v) (hence the condition at the start of this 
step), and so the required Thue path exists. 

4. Suppose that the input to this step is {u'x,v). Then the output is either the 
same as the input or is equal to {u' , vx^^). In the first case there is nothing to 
prove. In the latter time-s Thue path from u' to vx ^ with maximum 
node u' will give a time-s Thue path from u'x to vx~^x with maximum node 
u'x. Note that there have been no deletions of rules since this particular 
minimization was started. Induction on s therefore gives us a specific Thue 
path from x~^x to e at the end of this step. Moreover, no node along the Thue 
path is bigger than x~^x. In particular, the input rule to the minimization 
is not used in this Thue path (this follows from either of the conditions at 
the start of the step being satisfied), and so there is a time-s Thue path from 
vx~^x to V with maximum node vx~^x. Hence we obtain the required time-s 
Thue path from u'x to v. 

5. This step is analogous to the previous step. 

6. If the input to this step is (xu', xv') then the output is (u', v'). A time-s Thue 
path from u' to v' with maximum node u' yields a time-s Thue path from xu' 
to xv' with maximum node xu' . 

7. This step is analogous to the previous step. 

8. Let v' be the R-reduction of v. Immediately after this step there is a Thue 
path from v to v' with maximum node v which does not use the rule initially 
input. Using induction if necessary we have a time-s Thue path from v to v' 
with maximum node v. Hence a time-s Thue path from u to v' with maximum 
node either u or v' yields a time-s Thue path from utov with maximum node 
either u or v. 

9. If there is a Thue path from u to v with maximum node cither u or v, then 
the reverse of this path is a Thue path from v to u. 

This completes the induction step for the special case. Now consider the general 
case. The only reason why a Thue path at time t < s between u and v will not 
work at time s is if some elementary reduction used in this path has an associated 
rule (A, p) in S{t) which is found to be non-minimal between t and s. But in the 
proof of the special case wc have seen that there is a time-s Thue path between A 
and p with no node bigger than A. Therefore the time-t Thue path can always be 
replaced by a time-s Thue path without increasing the size of the nodes. □ 

Lemma 13.6. // a string is S{s) -reducible, it is S{t) -reducible for all t > s. 
Proof. If u is S(s)-reducible, there is an elementary S(s)-reduction u —^s{s) v. This 



means that v < u. By Proposition 13.5, for each time t > s, there is a Thue path 
from M to f with maximum node u. The first elementary reduction in this path has 
the form u — > w at time t. This proves the result. □ 

Lemma 13.7. If {\, p) is a rule in S at som e time during the n-th Knuth-Bendix 



pass but before the beginning of Step 12.5.5, then A will be R-reducible during all 



subsequent passes. If A is R{s) -reducible then A is R(t) -reducible for any t > s. 

Proof. Let (A, p) be a rule as in the statement of the lemma. Then at some prior 
time, (A, p) will have been a rule in S but not in Delete. Therefore for any m > n, 
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there will be a Thue path from A to p with maximum node A at the beginning 
of Step 12.5.5 during the m-th Knuth-Bendix pass. Now at the beginning of 
Step 12.5.5, all rules in S but not in Delete will have been output by the mini- 
mization handling routine 12.9 at some prior time during that Knuth-Bendix pass. 
In particular, each of these minimal rules {u,v) will have been sewn into WDiff. 
This does not, however, imply that {u,v) will be accepted by Rules at the start 
of the next pass since this rule may use or define an {x,x) arrow in WDiff. Due 
to some collapsing in WDiff caused by a welding operation, this may give rise to 
an (a;, a;) arrow from the initial state to itself. Such an arrow will be removed so 
that WDiff satisfies the properties 9.1. If this is the case then {u,v) will still have 
some prefix or suffix accepted by WDiff and hence by Rules at the start of the next 
Knuth-Bendix pass. Therefore for any m > n, A will have a substring which is the 
left-hand side of a rule accepted by Rules[m + 1], and so A is R-reducible during 
pass m + \ which proves the first statement. 

If A is R-reducible at any time during a pass, it is R-reducible at any later time 
in the same pass by Lemma 12.2 . We have proved that A is R-reducible once the 
next pass starts. So this completes the proof of the last sentence in the statement 
of the lemma. □ 



Lemma 13.8. At any time t, S{t) contains no duplicates. If a rule is deleted from 
S, it will never be re-inserted. 



Proof. The first statement follows by looking through 12.5 and checking where 
insertions of rules take place. We always take care not to insert a rule a second 
time if it is already present. 

Let (A, p) be a rule which is deleted at time s. Deletion either takes place during 
Step 12.5.2| or during Step 12.9.(:. In the latter case, some proper substring of A is 
the left-hand side of a rule in S, and this rule was present in S at some time before 



the beginning of Step 12.5.5 of the Knuth-Bendix pass in which (A,/9) was deleted. 
Therefore by Lemma 13.7, we see that this proper substring of A stays R-reducible. 



This means that no rule with left-hand side A will ever be re-inserted. 



So we assume that deletion takes place during Step 12.5.2 of the (n -|- l)-th 
Knuth-Bendix pass. Then between the time of the n-th Knuth- next pass when 
(A, p) is minimized, and the start of the next pass, it is in the subset Delete of S. 
Therefore it cannot be re-inserted during pass n. 

Now let m > n and suppose (A, p) has not been re-inserted before the beginning 
of the m-th pass. We will prove that it cannot be re-inserted during the m-th pass. 

Observe that no rules are minimized between the time r at the beginning of 
Step 12.5.5 in the (m — l)-st pass and the time t just defined. Therefore any time-r 



Thue path between A and p will also be a time-i Thue path. In particular, the rule 
(A',p') associated to the elementary reduction of A is unaltered during this time. 
At time-r all rules in S, except for those in Delete, are minimal. In particular, 
(A', p') was minimal and had been minimized at some prior point in the (m — l)-st 
pass. Therefore, (A',p') was sewn into WDiff during the (m — l)-st pass. As in 
the proof of Lemma [13.7 , it is possible that (A',p') is not accepted by Rules[m], 
but a substring A" of A' (and hence A), will be the left-hand side of an accepted 
rule. If A" is a proper substring of A, then A cannot be the left-hand side of a rule 
inserted during the m-th Knuth-Bendix pass. So we need only examine the case 
when A = A' = A". In this situation, p must have been R-reduced to a strictly 
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smaller string during the minimization of A. Therefore some substring of p was the 



left-hand side of a rule in S at that time. By Lemma 13.7, p stays R-reducible 



Suppose then that (A, p) is re-inserted during the m-th Knuth-Bendix pass. This 



can only happen as the result of Step 11.1.5. But for this to occur there could have 



been no rule in S with left-hand side equal to A at that time. Since we are assuming 
that A is the left-hand side of some rule (A, p") not on the Delete list at the start of 
the m-th pass, it follows that (A, p") must be deleted at some point during the m-th 
pass. But this can only happen if some proper substring of A is R-reducible during 



the m-th pass. By Lemma 12.2, this proper substring of A must be R-reducible at 



the point of re-insertion of (A, p) which is a contradiction. □ 
Definition 13.9. We say that a string u is permanently irreducible if there are 



arbitrarily large times t for which u is S(t)-irreducible. By Lemma 13.6 this is 
equivalent to saying that u is S(t)-irreducible at all times t > 0. A rule (A,p) 
in S is said to be permanent if p and every proper substring of A is permanently 
irreducible. 

Lemma 13.10. A permanently irreducible string is permanently R-irreducible. A 
permanent rule of S is never deleted. A permanent rule is accepted by Rules[n + 1] 
provided it is present in S when the n-th Knuth-Bendix pass begins, {and is accepted 
by Rules[m\ for all m > n). 

Proof. Let u be permanently irreducible. R-reduction of u can only take place if, 
immediately afterwards, some substring of u is S-reducible. This is impossible by 
hypothesis. 



A rule (A, p) is deleted only as a result of minimization. By Lemma 13.5, there 



would have to be a Thue path from A to p with largest node A. The first elementary 



reduction must therefore be rightward (see Definition [13. 3|) A ^s(t) A*- Since every 
proper substring of A is permanently R-irreduciblc, this first elementary reduction 
must be associated to a rule (A,/i). 

This is only possible if, when (A, p) was input to the minimization routine, p was 
R-reducible. However, it is permanently R-irreducible which is a contradiction. 

It follows that if (A, p) is present at the start of the n-th Knuth-Bendix pass, it 
will be sewn into WDiff at some point during the n-th Knuth-Bendix pass. As in 
the proof of Lemma |l 3.7] , the only way (A, p) would not be accepted by Rules[n+1\ is 
if some proper prefix or suffix is accepted by Rules[n + 1] . But this would contradict 
(A, p) being a permanent rule. Therefore, (A, p) is accepted by Rules[m\ for each 
m > n. □ 

Lemma 13.11. Let u be a fixed string. Then there is a to depending on u, such 
that, for all t > to, each elementary S{t) -reduction of u is associated to a permanent 
rule. If all proper substrings ofu are permanently irreducible, then, for t > to, there 
is at most one elementary reduction ofu, and this is associated to a permanent rule 
{u,w). 

Proof. There are only finitely many substrings of u. So we need only show that, 
given any string v, there is a to such that for all t > to, each rule in S{t) with 
left-hand side v is permanent. If there is a proper substring of v which is not 
permanently irreducible, then at some time Sq it becomes S(so)-reducible. By 



Lemma 13.6, it is S(s)-reducible for s > sq- By Lemma 13.7, it becomes R-reducible 
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at the beginning of the next Knuth-Bendix pass after sq. During this pass all rules 
with left-hand side v will be deleted. Also, since this proper substring of v is now 
permanently R-reducible, no rule with left-hand side equal to v will ever be inserted 
subsequently. In this case the lemma is true since ultimately there are no rules with 
left-hand side v. 

So we assume that each proper substring of v is permanently irreducible, and 
that V itself is S-reducible at some time t. A rule {v,w) will be permanent if w is 
permanently irreducible. Otherwise it will disappear as a result of minimization 



and, by Lemma 13.8, never reappear. There cannot be two permanent rules {v, Wi) 
and {v,W2) with wi > W2- For critical pair analysis would produce a new rule 
{wi,W2) during the next Knuth-Bendix pass, and so wi would not be permanently 
irreducible. □ 



Theorem 13.12. Let u be a fixed string in A* and let v be the smallest element 
in its Thue congruence class. Then, for large enough times, there is a chain of 
elementary reductions from u to v each associated to a permanent rule. After 
enough time has elapsed, R-reduction of u always gives v. 

Proof. We start by proving the first assertion. By hypothesis, we have, for each 
time t, a time-t Thue path pt from u to v, and we can suppose that pt contains no 
repeated nodes by cutting out part of the path if necessary. The only reason why 
we couldn't take pt+i to be pt is if some rule (A, p), used along the Thue path pt, 



is deleted at time t. By Lemma 13.5 we can, however, assume that each node of 
Pt+i is either already a node of pt or is smaller than some node of pt. 

Let ho be the largest node on po, and suppose that we have already proved the 
theorem for all pairs u and v which are connected by a Thue path with largest node 



smaller than ho- By induction, using Proposition 13.5, we can assume that ho is 
the largest node on pt for all time t. If = ho then since v is the smallest element 
in its congruence class, there are no elementary reductions starting from v, and we 
must have u = v in this case. 



By Lemma 13.11, we may assume that to has been chosen with the property 
that, for all strings w < ho and for all t > to, all elementary S(t)-reductions of 
w are associated to permanent rules which are accepted by Rules[n] provided n is 
sufficiently large. 

Let ho = UtOitVt ^s(t) fJ-tf3ti^t be the rightward elementary reduction of ho at 
time t. The rule (at,/9t) is independent of t for large values of t. Then {at,f3t) is 
permanent and at is R-reducible for large enough t. If u ^ ho, the same argument 
applies to the unique elementary leftward reduction with source ho at time t. 

If ho = u, let u — >s(t) be the first rightward elementary reduction for large 
values of t. By our induction hypothesis, there is a Thue path of elementary reduc- 
tions from w to V, each associated to a permanent rule, and with no node larger 
than w, and so we have the required Thue path from u to v. 

Suppose now that ho ^ u, so that we get two permanent rules, associated to 
the leftward and rightward elementary reductions of ho. If the two elementary 
reductions are identical, that is, if the two permanent rules are equal and if their 
left-hand sides occur in the same position in ho, then pt contains a repeated node 
which we are assuming not to be the case. So the two elementary reductions occur 
in different positions in ho- Now choose t to be large enough so that the two rules 
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Figure 5. Removing the node ho when the leftward and rightward 
reductions are obtained from rules having disjoint left-hand sides. 



concerned have already been compared in a critical pair analysis in Step 12.5.5.b 
during some previous n-th Knuth-Bendix pass. 

If these two rules have left-hand sides which are disjoint substrings of ho, then 
we can interchange their order so as to obtain a Thue path from u to v where all 
nodes are strictly smaller than ho — see Figure ||. The first assertion of the theorem 
then follows by the induction hypotheses in this particular case. 

If the two left-hand sides do not correspond to disjoint substrings of ho then, 
by assumption, there is some time t' < t, such that a critical pair {u',v\w') was 
considered. Here u' — >s(t') ^^'^ ~^S{t') ^-^^ elementary S(t')-reductions given 
by the two rules, and u' is a substring of ho- After the critical pair analysis, at 
time t" < t , the Thue paths illustrated in Figure || are possible. As a consequence 
of Lemma 13.5, it is straighforward to see that for all times s > t", v' and w' 
can be connected by a time-s Thue path in which all nodes are no larger than the 
largest of v' and w' . In particular, this applies at time t so that the targets of the 
two elementary S(i)-reductions from ho can be connected by a time-t Thue path in 
which all nodes are smaller than ho- This completes the inductive proof of the first 
assertion of the theorem. 

We have arranged that t is large enough so that, for all w < m, all elementary S(t)- 
reductions of w are associated to permanent rules, and such a w can be permanently 
R-reduced to the least element in its Thue congruence class. It follows that such 
a w is R-irreducible if and only if it is minimal in its Thue class. In particular 
R-reduction of 7i must give v. □ 



Corollary 13.13. The set of permanent rules in R is confluent. The set of such 
rules is equal to P = fltUsX^l'*)- ^ string u is smallest in its Thue congruence 
class if and only if it is permanently irreducible. 



Proof. The first and third statements are obvious from Theorem 13.12 



For the 

second statement, each permanent rule is contained in P by Lemma [13.10 , Con- 
versely, if we have a rule r in S which is not permanent, then for all sufficiently 
large times s either its right-hand side or a proper substring of its left-hand side is 
S(s)-reducible. Theorem 13.12| ensures that this reducible string is R(s)-reducible 
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ho 

Ai A2 




Figure 6. When the leftward and rightward reductions from 
ho are obtained from rules (Ai,pi) and (A2,P2) having over- 
lapping left-hand sides, this diagram shows the time-t" Thue 
paths that exist after the critical pair associated with the triple 
(u[u2u'^, piu'^, u'iP2) has been resolved to the pair {zi, Z2)- 



for all sufficiently large times s. Therefore r will be minimized and deleted from S. 



Hence from Lemma 13.8 we see that r is not contained in P. □ 



The next result is the main theorem of this paper. 

Theorem 13.14. Let G be a group with a given finite presentation and a given 
ordering of the generators and their inverses. Suppose that G is shortlex- automatic. 



Then the procedure given in 12.5 will stabilize at some no with Rules[n + 1] = 
Rules[n\ if n > no. P is then the language of a certain two-variable finite state 
automaton and the automaton can be explicitly constructed. 

Pro of. P consists of pairs of strings (A, p) giving a valid identity in G (by Proposi- 
tion 13.4 ) , and where p and all proper substrings of A are permanently irreducible. 
A string is permanently irreducible if and only if it is the unique shortlex represen- 
tative of the corresponding group element. Since this structure is automatic, there 
are only finitely many word differences and arrows generated by the rules in P. If 
we therefore weld together the automata M(r) corresponding to the rules r in P, 
we obtain a finite rule automaton Rules. For each arrow in this automaton, we can 
pick a specific rule r which makes use of the arrow when it is read into Rules. 

These specific rules will eventually be generated by our Knuth-Bendix process. 
Such a rule is never deleted once it is generated, since it is permanent. So eventually 
Rules[n\ will contain Rules as a sub-automaton. But once this has happened, R- 
irreducible will be equivalent to shortlex-vauAnvAl. Therefore all non-permanent 
rules will be removed during the next pass, and the redundant states and arrows 
of Rules[n] will be removed. Rules[m] is then constant for m > n. □ 
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Of course, the problem with the above resuh is that we do not currently have 
any method of knowing when we have reached uq. It might be possible to prove 
that this question is undecidable if one varies over all shortlex presentations of 
s/iort/ea;-automatic groups. It might also be undecidable in one varies over all finite 
presentations of word hyperbolic groups. 



14. Miscellaneous details 

In this section we present a number of points which did not seem to fit elsewhere 
in this paper. 

14.1. The structure of S. This set is given in a data structure arranged so that it 
is quick to find a rule in it, given only the left-hand side, quick to delete a specified 
rule and quick to add a rule. All these operations take place repeatedly in the 
Knuth-Bendix program. It is also an advantage to have a robust enough method 
for iterating through S, so that the process is not disrupted if rules are added or 
deleted while the iteration is proceeding. (We don't mind if the iteration fails to 
catch the rules added during iteration.) 



14.2. Aborting. It is possible that we come to a situation where the procedure 
is not noticing that certain strings are reducible, even though the necessary infor- 
mation to show that they are reducible is already in some sense known. It is also 
possible that reduction is being carried out inefficiently, with several steps being 
necessary, whereas in some sense the necessary information to do the reduction in 
one step is already known. An indication that our procedure could be improved 
is that WDijf is constantly changing, with two states being identified and conse- 
quent welding, or with new states or arrows being added. In this case it might be 
advisable to abort the current Knuth-Bendix pass. 

To see if abortion is advisable, we can record statistics about how much WDiff 
has changed since the beginning of a pass. If the changes seem excessive, then the 
pass is aborted. A convenient place for the program to decide to do this is just 



before another rule from New is examined at Step 12.5.4 



If an abort is decided upon then all states and arrows of WDiff are marked as 



needed. At this point the program jumps to Step 12.5.1 



14.3. Priority rules. A well-known phenomenon found when using Knuth-Bendix 
to look for automatic structures, is that rules associated with finding new word 
differences or new arrows in WDiff should be used more intensively than other rules. 
Further aspects of the structure are then found more quickly. These observations 
are not the consequence of a theorem — they are observed when programs are run. 

A new rule associated with new word differences or new arrows is marked as a 
priority rule. If a priority rule is minimized, the output is also marked as a priority 
rule. If a priority rule is added to one of the lists Considered, This or New, it is 
added to the front of the list, whereas rules are normally added to the end of the 
list. Just before deciding to add a priority rule to New, we check to see if the rule 
is minimal. If so, we add it to the front of This instead of to the front of New. 



When a rule is taken from This at Step 12.5.5 during the main loop, it is normally 



compared with all rules in Considered, looking for overlaps between left-hand sides. 
In the case of a priority rule, we compare left-hand sides not only with rules in 



AUTOMATIC GROUPS AND KNUTH-BENDIX 



39 



Considered, but also with all rules in This. If a normal rule (A, p) is taken from This 
and comparison with a rule in Considered gives rise to a priority rule, then the rule 
(A, p) is also marked as a priority rule. It is then compared with all rules in This, 
once it has been compared with all rules in Considered. 

Treating some rules as priority rules makes little difference unless there is a 
mechanism in place for aborting a Knuth-Bendix pass when WDiff has sufficiently 
changed. If there is such a mechanism, it can make a big difference. 

14.4. An efRciency consideration. During reduction we often know have a state 
s in a two-variable automaton. We usually know x d A and we are looking for an 
arrow labelled {x,y) with certain properties, where y G . It therefore makes a 
big difference if the arrows with source s are arranged so that we have rapid access 
to arrows labelled (x, y) if x is given. 



15. The past and the future 

15.1. A failed idea. Our original idea was to avoid having an explicit finite set 
of rules S. Instead we tried to attach extra information to the states and arrows 
of our automata so that the set of rules implicitly held included both a finite set, 
corresponding to our current S, and the possibly infinite set held by the automata. 
The idea was to avoid using the considerable amount of space used by S. This idea 
did not work and we now explain why. 

The idea was that it didn't matter too much if the finite set of rules held implicitly 
was too big. The logic of Knuth-Bendix only goes wrong if it is too small. However, 
if the extra information attached to states and arrows is not sufficiently explicit, 
there is often a huge growth from one pass to the next in the finite set of rules 
implicitly held. This growth is not caused by the Knuth-Bendix process itself, but 
is a by-product of the way we are using the extra information to specify the finite 
set of rules. 

Another approach is then to attach much more information to states and arrows 
in an attempt to limit the unnecessary growth referred to in the previous paragraph. 
But this extra information itself requires a lot of space, more even than holding the 
rules separately! Moreover, it turned out not to be possible to conveniently limit 
the growth as much as was necessary. So holding more information in the finite 
state automata was worse on all counts, including the complexities of writing the 
code, than the simpler scheme of holding the rules separately. 

15.2. The present. Many of the ideas in this paper have been implemented in 
C-l— I- by the second author. But some of the ideas in this paper only occurred to us 
while the paper was being written, and the procedures and algorithms presented in 
this paper seem to us to be substantial improvements on what has been implemented 
so far. An unfortunate result of this is that we are unable to present experimental 
data to back up our ideas, although many of these ideas have been explored in 
depth with actual code. Our experimental work has been essential in enabling us 
to come to the better algorithms which are presented here. 

15.3. Comparison with kbmag. Here we describe the differences between our 
ideas and the ideas in Derek Holt's kbmag programs [^. These programs try to 
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compute the shortlex-automatic structure on a group. Our program is a substitute 
only for the first program in the kbmag suite of programs. 

In kbmag, fast reduction is carried out using an automaton with a state for every 
prefix of every left-hand side. In our program we also keep every rule. However, the 
space required by a single character in our program is less by a constant multiple 
than the space required for a state in a finite state automaton. Moreover, compres- 
sion techniques could be used in our situation so that less space is used, whereas 
compression is not available in the situation of kbmag. 

The other large objects in our set-up are the automata P{Rules[n]) defined in 
Section |^ and Q{Rules[n]) defined in Section ^ In kbmag, there has also to be 
an automaton like P{Rules[n\), and it is possible to arrange that this automaton is 
only constructed after the Knuth-Bendix process is halted. This can avoid running 
out of space. In kbmag there is no analogue of our Q{Rules[n]). 

In kbmag, reduction is carried out extremely rapidly. However, as new rules 
are found, the automaton in kbmag needs to be updated, and this is quite time- 
consuming. In our situation, updating the automata is quick, but reduction is 
slower because the string has to be read into two different automata. Moreover 
we sometimes need to use the method of Section |l^ which is slower (by a constant 
factor) than simply reading a string into a deterministic finite state automaton. 

In kbmag, there is a heuristic, which seems to be inevitably arbitrary, for decid- 
ing when to stop the Knuth-Bendix process. In our situation there is a sensible 
heuristic, namely we stop if we find Rules[n + 1] = Rules[n\. 

In the case of kbmag, there are occasional cases where the process of finding the 
set of word differences oscillates indefinitely. This is because redundant rules are 
sometimes unavoidably introduced into the set of rules, introducing unnecessary 
word differences. Later redundant rules are eliminated and also the corresponding 
word differences. This oscillation can continue indefinitely. Holt has tackled this 
problem in his programs by giving the user interactive modes of running them. 

In our case, the results in Section show that, given a shortlex automatic group, 
the automaton Rules[n] will eventually stabilize, given enough time and space. 

We believe that the main advantage of our approach will only become evident 
when looking at very large examples. We plan to carry out a systematic examination 
of shortlex-automaiic groups generated by Jeff Weeks' SnapPea program — see [ p^ — 
in order to carry out a systematic comparison. 

15.4. Other situations. We should remark that our methods should apply with 
some modifications to certain other orderings, not only to the shortlex-ordeniig. 
The essential feature we need is that the set of pairs {u,v), such that u > v, is 
a regular language. Other orderings than shortlex, for example an ordering called 
the wreath product ordering, have been useful in theoretical discussions 0. The 
wreath product ordering is used in programs by Holt which look for coset automatic 
structures. 

Bill Thurston has suggested that we generalize our programs to apply directly to 
a triangulated space rather than to a group. It should be straightforward to make 
this generalization in both the kbmag programs and in ours. 
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